Journal of Applied Psychology 


Kennetu E, Crark, Editor 
University of Colorado 





Table of Contents 


The Measurement of Executive Success: Charles L. Hulin............... 2.0 ce eee eee ees 
A Short Method for Estimating a Distribution of Consumer Preferences: Purnell H. Benson 


Self-Description as a Predictor of Rate of Promotion of Junior Foreign Service Officers: Regis 


H. Walther 


Prediction of College Achievement from the Edwards Personal Preference Schedule at Three 
Levels of Intellectual Ability: Leonard D. Goodstein and Alfred B. Heilbrun, Jr.. . 


Oral Communication and Sales Effectiveness: R. Wayne Pace 
Variability of Performance in a Vigilance Task: T. W. Faulkner 


Retest Consistency and the Writing of Life History Items: A First Step: William A. Owens, 
ye, ET A Ss CUES, who U hiss cc vcontesrecnsuecengessees 


Evaluation of Input Devices for a Data Setting Task: Frank J. Minor and Stanley L. Reves- 
Se ae Ae ee 


Nonrandom Tendencies in Interpolating between End-Points: Richard C. Sorenson and 


Arnold L. Towe........... 
Vigilance Performance as a Function of Paired Monitoring: Bruce O. Bergum and Donald 
Interviewer Consistency in the Use of Empathic Models in Personnel Selection: Daniel 
Sydiaha.... acesletaten Sia 6 
Personality Variables in Union-Management Relations: Ross Stagner.. . 
Identification of Cola Beverages: Frederick J. Thumin............. 
Mood Changes during a Management Training Laboratory: Bernard M. Bass.. 


The Relative pen of Visual and eet id Feedback in.Speed Typewriting: M. Joan 
Diehl and R. Seibel. = Se oe 


Some Differential Effects of Race of Rater and Ratee on Early Peer santas of Combat 
Aptitude: John E. deJung and Harry Kaplan “e 





American Psychological Association 


Volume 46, Number 5 October 1962 





Consulting Editors 


ARTHUR BRAYFIELD, American Psychological 
Association 


GeorcE E. Brices, Ohio State University 

NorMAN FREDERIKSEN, Educational Testing 
Service 

Leonarp D. Goopstemn, University of Iowa 

Epwin R. Henry, Standard Oil Company of 
New Jersey 


Joun Horzanp, National Merit Scholarship 
Corporation 


Cuirrorp E. JURGENSEN, Minneapolis Gas 
Company 


LauRENCE S. McGaucHRAN, University of 
Houston 


Quinn McNemar, Stanford University 
Harowp F. Rotue, Beloit Corporation 
Tuomas A. Ryan, Cornell University 


Criark L. Wutson, Batten, Barton, Durstine, 
and Osborn, Incorporated 





This journal gives primary consideration to origi- 
nal investigations in any field of applied psychol- 
ogy except clinical psychology, although a de- 
scriptive or theoretical article may be accepted 
if it represents a special contribution in an ap- 
plied field. Quantitative investigations of inter- 
est or value to psychologists working in the fol- 
lowing broad fields will be considered: vocational 
and educational prognosis, diagnosis, and guidance 
at the secondary and college level; personnel re- 
search in business, industry, and government; en- 
gineering psychology; industrial working condi- 
tions; research on opinion and morale factors; job 
analysis and classification research; market and 
advertising research. 


Manuscripts must be accompanied by an ab- 
stract of 100 to 120 words typed on a separate 
sheet of paper. The abstract should conform to 


the style of Psychological Abstracts. Detailed in- 
structions for preparation of the abstracts may be 
obtained from the Editor or from the APA Cen- 
tral Office. 


Manuscripts should be addressed to the Editor: 


Dr. Kenneth E. Clark 

Office of the Dean 

College of Arts and Sciences 
University of Colorado 
Boulder, Colorado 


All manuscripts must be submitted in duplicate. 
Original figures are prepared for publication; dupli- 
cate figures may be photographs or pencil-drawn 
copies. 


Manuscripts must conform to the style require- 
ments described in the Publication Manual of the 
American Psychological Association. 





Journal of Applied Psychology 


Published bimonthly by the 
American Psychological Association 
Prince and Lemon Sts., Lancaster, Pa. 
and 1333 Sixteenth Street N.W. 
Washington 6, D. C. 


$10.00 per volume 


HeLen Orr 
Managing Editor 


ExzasetH S. Reep 
Advertising Manager 


$2.00 per issue 


Virctnta RicHarps 
Editorial Assistant 


Subscriptions, orders, and business communications should be addressed to the American Psychological Association, 


1333 Sixteenth St. N.W., Washington 6, D. C. 
the month to take effect the following month. 


Address changes must reach the subscription office by the tenth of 
Undelivered copies resulting from address changes will not be replaced; 
subscribers should notify the post office that they will guarantee second-class forwarding postage. 


Other claims for 


undelivered copies must be made within four months of publication. 


Second class postage paid at Lancaster, Pennsylvania and at additional mailing places. 


© 1962 by the American Psychological Association, Inc. 





Journal of Applied Psychology 


VoL. 46, No. 5 











THE 


CHARLES L 


MEASUREMENT OF 


EXECUTIVE SUCCESS 


HULIN 


Cornell University 


The definitions and criteria of executive 


past show little consistency across 


studies 


which have been used in the 
The central problem of this in- 


success 


vestigation was the empirical determination of the relationship between dif 


ferent measures of executive success 


from a sample of 5 


measures of success based on absolute 
salary, salary increase, and levels promoted were developed 


The data gathered 


executives indicated that had one of these measures been 


used rather than the others, quite different conclusions would have been drawn 


from the data. A rigorous analvsis of the various criteria of executive success 


is suggested as a solution to the problem of 


The success or failure of an executive may 
take any one of several forms. If an executive 
is “‘successful” he may be promoted rapidly 
in his own company, he receive large 
salary increases, he may 
another company, or he be placed in 
charge of important projects. The prediction 
of executive success by investigators working 
in this area requires the understanding of 
the dimensions along which executive success 
may vary (cf. Bingham & Davis, 1924; 
Gifford, 1928; Henry, 1948; Meyer & Pressel, 
1954, etc.). 

Criterion 


may 
be hired away by 


may 


measures used in validation of 


executive selection programs and _ training 
little The 
implications of their assumptions, their inter- 
relationships, and their relationships to be- 


havior 


studies seem to be understood. 


are, for the most part, unknown. 


Perhaps as Weitz (1961) has observed: 
If we knew more about the functioning of criterional 
variables, we should be able to predict which criteria 
are relevant for effects of independent 
variables and with this knowledge, be able to state 
more concerning the operation of the 
and the intervening variables 


assessing 


indepe ndent 


Illinois 

The writer would like to express his appreciation 
for the assistance and this 
problem by Henry A Patricia 
Cain Smith of Cornell 


1 Now at the University of 


him on 
Landsberger and 
University 


advice given 


“What is executive success ?’ 


Little attention has been given to studying 
the functioning of the criterional variables in 
this area. 

The purpose of this paper will be to in- 
vestigate the interrelationships of three ob- 
jective indices of executive success and their 
relationships to independent variables such 
as years of education, 
ground, age, and tenure. 

A few specific examples should clearly 
illustrate the contamination by uncontrolled 
factors of the criterional variables which have 
been used in past studies. Bingham and Davis 
(1924) used a combination of salary, invest- 
ments, debts, and club memberships as an 


socioeconomic back- 


index of success. Salary was used by Gifford 
(1928) with controls for job tenure and 
initial salary. Starch (1942), Wald and Doty 
(1954), Meyer and Pressel (1954), and 
Henry (1948) all used position in the 
managerial hierarchy as a measure of success 
without regard for age or tenure. Less ob- 
jective indices have included job performance 
ratings by supervisors (Guilford, 1952) and 
peer ratings (Rosen & Rosen, 1957) without 
control for rating biases. 

A complete discussion of these various in- 
dices will not be attempted here. Most of 

2 Although this study 


rather than industrial 
made remains the same 


union officers 
point to be 


was done on 
executives, the 





304 


these criteria, however, fail to control for 
tenure or age of the executives. When absolute 
salary or position in the managerial hierarchy 
is used as an index of success, the older execu- 
tives will be considered more successful re- 
gardless of how long it took them to attain 
their positions or salary. This problem alone 
could account for the disagreement between 
Bingham and Davis (1924) who found no 
relationship between intelligence and execu 
tive “success” and Gifford (1928) who found 
salary to be related to standing in college class 
when tenure was equivalent. Whenever meas- 
ures such as these are used, the variance due 
to age and tenure must be controlled before 
any conclusions are made. 

With regard to less objective measures such 
as supervisory or peer ratings Campbell 
(1956), Ben-Avi (1947), and Grant (1955) 
report findings which indicate that 
different ratings correlate with each other, 
they are seldom highly related to more ob- 
jective measures. It may be that the correla- 
tions between the subjective ratings represent 
little more than a ‘“‘method” factor. In an 
case their lack of relation to objective meas 
ures would not seem to be in their favor as 
indices of success. 


while 


METHOD 


Three potential criteria were obtained from each 
of 50 executives in two companies? in an effort to 
construct a reliable and valid 
These three indices were based on 
present salary, and increases. The first of 
these measures was obtained simply by asking the 
executives how many levels they had been promoted 
above their starting level. The number of reported 
“levels promoted” was used as one criterion of 
success. Its (definition of 
level, biased reporting, etc.) but it is still potenten 
tially informative. 

The first of the salary indices was computed by 
obtaining the regression of present salary on tenure 
and obtaining each executive’s percentage deviation 
from the empirically 
Algebraically, 


index of success 
levels promoted, 


salary 


defects are obvious 


determined regression line 


(present salary—predicted salary) 


Index A = : 
predicted salary 

3 Fifteen of these executives were drawn 
small manufacturing company and included all of 
this company’s management staff. Thirty-five were 
drawn from a second, manufacturing 
pany. These 35 executives comprised all of the 
executive staff with more than 10 years’ service at 
the company. 


irom a 


larger, com 


CuHaARLEs L. HULIN 


indicating as more successful the executive who is 
earning more relative to the other executives in 
the company who have the same tenure. This index 
should correlate positively with years of education, 
socioeconomic background, and starting salary, be- 
cause of the higher beginning salary of the better 
educated executives. In spite of regular salary in- 
creases the older executives are seldom able to 
keep up with the younger executives who began 
work during the postwar boom in industry. Thus 
Index A would be expected to negative 
relationship with tenure 

The second salary index (Index B) was calculated 
from the salary data by obtaining the regression of 
salary increase (present salary 
salary) on years with the company 
percentage deviation from this 
then calculated. Algebraically, 


show a 


minus | starting 
Each executive's 
regression line was 


Index B 


(Salary increase-predicted salary in 


reas¢ 


predicted salary increase 


would therefore consider as the more 
executive the one who has increased 
his salary more relative to the other executives in 
the company with equal tenure 

Neither of these 
changing “real’ 


This index 
successful 


corrected for the 
Poth were 
puted, however, for each executive relative to other 
executives in the company who had equal tenure 
It was felt that the effects of such a 
be negligible. 

In addition to these 


indices 
value of 


was 
money 


com 


bias would 


indices of executive 
backgrounds of th 
executives were measured by scaling the 
to a question about their fathers’ occupations. The 
executives were placed in one of 1( 
fessional, managerial and officials, 
businessmen, low-level supervisors, sales, 
skilled workers, semiskilled workers, and 
workers. The values for this 
from 10 (highest socioeconomic 
1 (lowest). Measures were 
years of education, and tenure 

A 7X7 correlation 
each company 
coefficients 


three 
success, the socioeconomic 


response Ss 


categories pro 
teachers, small 
service, 
unskilled 
variable 
background) to 


scale ranged 


also obtained for age, 
matrix was computed for 
Pearson product-moment correlation 
computed between all variables 
except when the relationship of levels promoted to 
any of the other variables was determined; in the 
latter cases a biserial 


were 


correlation coefficient was used 


RESULTS 


The two correlation matrices 


computed 
from the two companies showed similar pat- 
terns of correlations and were combined by 


means of appropriately weighted Fisher 
z score transformations. The resulting matrix 
based on a sample of 50 is presented below. 

As predicted, Index A (based on salary 
earned) is significantly correlated with years 





THE 


MEASUREMENT OF 


EXECUTIVE SUCCESS 


TABLE 1 


COMBINED CORRELATION MATRIX 
(Companies 1 and 2 


1. Age 
Years with company 
Years of education 
Socioeconomic backgro 
Index B 
Index A 


Levels promoted 


with the company (negatively) and with 
years of education and socioeconomic back- 
ground (positively). Index B, however, is 
related to none of these variables. Index A 
is also significantly related to starting salary, 
(p < .01). The correlation of In- 
dex B, however, was only .01 with starting 
salary. In spite of these obvious differences, 
Index A and Index B 
of .83. 


=e 


showed a correlation 
A cluster analysis (McQuitty, 1957) of 
this matrix isolated two underlying clusters 
This method of elementary linkage analysis 


has been demonstrated to be quite adequate 


as a means of determining the number and 
nature of the underlying factors which would 
be isolated if a centroid analysis were done 
on a correlation matrix. The these 
clusters included Index A, Index B, years of 
education, and background. 
The second cluster included age, years with 
the company, and levels promoted. Both the 
number of variables and the number of cases 
studied are too limited to allow any great 
reliance to be placed on these clusters but 
they are nevertheless suggestive. 


first of 


socioecon¢ ymic 


DISCUSSION 


The results of this study indicate that the 
three measures of success studied would 
actually give very different results if used 
as the sole criterion of executive success. If 
Index A were used as the sole criterion of 
executive success, the results of the study 
would indicate that the well-educated person 
with the higher socioeconomic background 
would become the more successful executive. 


(This general finding has been reported by 
Newcomer, 1955; Wald & Doty, 1954; 
Warner & Abegglen, 1955.) The study might 
not, however, give a true picture of the situa- 
tion. The contamination of Index A with 
tenure would make such conclusions tenuous 
at best.* It should be noted that, in 
spite of its shortcomings, Index A probably 
shows less contamination with tenure or age 
than other criteria of success based on salary 
which have been used in past studies. In- 
dex B, which showed less contamination than 
Index A with both starting salary and tenure, 
was unrelated to both socioeconomic back- 
ground and education. Thus if this criterion 
had been used, one would concluded 
that family background and education are 
not valid predictors of success. Yet both of 
these indices seem to be plausible. In this 
area in which the results of any study are 
so dependent on the investigators’ choice 
of a criterion, a large-scale study of criterion 
measures, their interrelations, and perhaps 
their underlying factor structure should be 
undertaken. If this study were done, the 


also 


have 


*Even though an attempt was made to partial 
out the effects of tenure by taking percentage 
deviations from the regression line of salary on 
with the company, the final results indicate 
that this control was not completely successful. In 
both companies Index A 
to tenure (correlation coefficients of 
It would thus appear that the executives 
who started work several years ago will never 
“catch up” in terms of salaries with their younger 
colleagues started working during the 
few years of extreme competition for 

material. This finding is supported by the 
of Hilton and Dill (1960) 


years 


was significantly related 
47 and —.68). 
older 


who past 
executive 


findings 





306 


writer would predict that a success factor 
would have high loadings of absolute salary, 
number of levels promoted, number of men 
ultimately supervised, position in the _hier- 
archy, and rate of salary increase. Further, 
supervisory and peer ratings should have low 
loadings on this factor. 

The use of salary data as 
success has received more than its share of 
criticism from many different writers. It can 
be argued, however, that a linear combina- 
tion of variables which includes salary as a 
central variable is a true measure of success. 
(a) Although no direct computation of reli- 
ability estimates of Index A or Index B was 
done, since the components are inherently 
reliable (starting salary, present salary, and 
tenure were all obtained from the personnel 
files and thus were not subject to biased re- 
porting) any nonrandom combination of 
these components will be reliable in the sense 
of having no error of measurement. The reli- 
ability over time has not yet been studied. 
(b) The first cluster of variables 
considered as some 


criteria of 


could be 
evidence of construct 
validity as discussed by Campbell (1960). 
The clustering technique used places a vari- 
able in a cluster if it correlates higher with 
a variable already in the cluster than any 
other variable in the matrix. Thus the first 
cluster indicates that the four 
(Index A, Index B, socioeconomic back- 
ground, and years of education) were all 
measuring to a limited extent the same 
underlying or source factor. Further, previous 
studies have indicated that these latter two 
objective variables are commonly related to 
success. Admittedly this clustering is slim 
evidence for construct validity and does not 
prove that salary measures are the best 
criteria available. It make the use of 
these measures more tenable. (c) Although 
poor at giving reliable and 
unbiased ratings of success to their peers or 


variables 


does 


executives are 


subordinates, perhaps when ratings represent 
dollars rather than mere slips of paper, the 
result is a more valid indication of the execu- 


tive’s worth to the company. (d) Finally, 
when comparisons to be made are restricted 
to executives within the same company, there 
are few variables external to the 
to invalidate the comparisons. 


company 


CHARLES L. 


HULIN 


linear combina- 
tion of certain objective variables with a 
salary measure (corrected for tenure) as the 
central variable would seem to be preferred 
to ratings, raw salary, or position in the 
hierarchy as a measure of executive success. 


For the above reasons a 


REFERENCES 


Ben-Avi, A. H. Studies of subjective measurement 
of flying proficiency. In N. E. Miller (Ed.), Psy 
chological research on pilot training. (Res. Rep 
No. 8) Washington, D. C.: AAF Aviation Psy 
chology Program, 1947. Pp. 108-109 

BincHAM, W. V., & Davis, W. T. Intelligence test 
scores and business appl. Psychol. 
1924, 8, 1-22. 

CampBELL, D. T. Leadership and its effect on the 
group. Ohio St. U. Stud., Bur. Bus. Res. Monogr., 
1956, No. 83 

CAMPBELL, D. T. Recommendations for APA test 
standards regarding construct, trait, or discrimi 
nant validity Psychologist, 1960, 15, 546 

Girrorp, W. S. Does 
Harpers Mag., 1928, 156, 669-674 

Grant, D. L. A factor analysis of managers’ 
J. appl. Psychol., 1955, 39, 283-286 

GvILFrorD, Joan. Temperament traits of 
and supervisors measured by the Guilford Per 
sonality Inventories. J. appl. Psychol., 1952, 36 
228-233 

Henry, W. E. Executive personality and job success 
AMA personnel Ser., 1948, No. 120 

Hirton, T. L., & Dut, W. R 
measure oO! career 
American Psychological 
ings, Chicago, 1960 

McQuitty, L. L. Elementary linkage analysis for 
isolating orthogonal and oblique types and typal 
relevancies. Educ. psychol. Measmt., 1957, 17 
207-229 

Meyer, H. D., & Presser, G. L. Personality test 
scores in the management hierarchy. J. appl 
Psychol., 1954, 38, 73-8¢ 

Maser. The big 
The factors that 
York: Columbia, 

Rosen, H., & Rosen, R. A. H. Personality 
and role in a union business agent group. J 
Psychol., 1957, 41, 131-136 

Starcu, D 


success. J 


Amer 
business want scholars? 


ratings 


executives 


Salary growth as a 
Paper 


Association 


progress prese nted at 


annual meet 


NEWCOMER, busine veC 


made him, 1900-1950. New 
1955 


variables 


appl 


An anlysis of the careers 
tives. Psychol. Bull., 1942, 39, 435 
Wap, R. M., & Dory, R. A. The top executive 
A first-hand profile. Harvard bu Rev., 1954, 
32(4), 45-54 
Warner, W. L., & ABeccLeN, J. Big busine 
ers in American New York 
WEITz, J Criteria for criteria imer 
1961, 16, 228-231 


lead 
Harper, 1955 
Psve hologist, 


octiety 


Received July 28 


1961) 





A SHORT 


METHOD FOR ESTIMATING A DISTRIBUTION 


OF CONSUMER PREFERENCES 


PURNELL 


H 


BENSON ° 


Swarthmore, Pennsylvania 


4 brief questionnaire 


and statistical procedure is 


described for estimating the 


statistical distribution of consumer preferences for product variations along a 


dimension 
distributions is illustrated 


single qualitative 


When consumers are alike in what they 
want, a procedure for finding their optimum 
point along a range of variation in product 
quality consists in establishing an empirical 
curve relating the degree of like or dislike 
to changes in the quality, and finding the 
highest point on this curve if cost of pro- 
viding the quality can be ignored (Benson, 
1955, 1958, 1961). If must 
sidered, the point must be found on the curve 
of rising satisfaction where the gain in satis- 


cost be con- 


faction to the consumer of more of the quality 
just equals the loss in satisfaction from pay- 
ing more money. 


CONSUMERS WHO DIFFER 


When consumers differ in the optimum 
point of quality they desire to buy, such as 
how wide a food wrap they would rather 
knowledge of the 
numbers of consumers who pick their point 
of highest preference at various places along 
the qualitative dimension. This description 
provided 
curve, in 


buy, what is sought is a 


distribution 
of the 


is by a. statistical 
Figure 1. The height 


as 


vertical column at a particular point under 


the relative number of 
consumers whose preference is to pay for a 
particular width of food wrap. If the total 
the all con- 


sumers, then the area of the column gives 


such a curve gives 


area under curve represents 
the fraction of consumers whose preference 
is to buy the width of food wrap where the 
column is located. 


The required distribution curve could be 


This 
contract 
the Scott 


under 
Division 


undertaken 


Research 


paper 
with 


reports research 
the Marketing 
Paper Company 

Complete is 
Swarthmore, Pennsylvania 


ol 


address 140 Rutgers Avenue, 


The fitting of 
Implications of distributions of consumer preferences 
for product planning and marketing strategy are 


data to ogive curves for skewed 


suggested 


constructed by providing consumers with 
many possible product forms from which to 
choose and letting them select the one form 
whose point of quality, such as width, most 
nearly provides what they wish to buy. Even 
with short-cut methods for reducing the labor 
of paired comparisons (Torgerson, 1958) two 
difficulties remain. The preparation of prod- 
uct samples for all possible degrees of quality 
is costly. The trying out of many different 
product forms time consuming and fa- 
tiguing for the respondents. The practical 
research problem may be to approximate the 


1S 


distribution curve with a minimum of product 
forms tried out by consumers. 


OF NORMAL STATISTICAL 


DISTRIBUTION 


ASSUMPTION 


Statistical distributions are often satisfac- 
torily approximated by the usual normal, 
bell-shaped curve with a single hump and 
tailing off at the where few con- 
sumers can be found. Unless the distribution 
is believed to have more than one peak, 
application of the normal curve affords an 
initial approximation of considerable useful- 
ness to the product planner. If as few as 
two or three samples are tried out with con- 
sumers, sufficient information can be obtained 
to show how spread out is the distribution 
of consumers along the qualitative dimension 
from which individual consumers select points 
of preference. 

If the focus of research interest upon 
how consumers react to qualitative charges 
irrespective of brand identification, the prod- 
uct samples should be administered un- 
labeled. The products should be_ identical 
except for the price and for the single quality 
along which the distribution 


extremes 


is 


of consumer 


307 





PURNELL H. BENSON 


, 


37 % 


Prefer A 





5 


Prefer B 





12" 


-.33] | 
-.1 





-.02 


Width of Food Wrap 


Fic. 1. Distribution of preferences for widths of food wrap 


preferences is to be mapped. If consumers 
are asked to report their buying choice be- 
tween the two samples, this information 
divides the distribution between those pre- 
ferring Sample A and Sample B. The area 
to the left of the vertical line represents the 
proportion choosing A, and the area to the 
right the proportion choosing B. 

If Products A and B are food wraps 
which are 10 inches and 14 inches in width, 
the percentage choosing A and the percentage 
choosing B indicate approximately how many 
consumers wish to use a food wrap smaller 
than 12 inches wide and how many prefer 
to use one which is larger than 12 inches 
wide. Still unanswered would be the question 
of whether these consumers include some 
who would prefer a food wrap as narrow as 
6 inches or as wide as 18 inches. In other 
words, the extent to which consumers are 
spread out is still not defined. 

To achieve this information it becomes 
necessary either to expose consumers to three 
samples or else to ask consumers in a two- 
sample study the following type of question: 
“Would you rather buy a food wrap which 
is narrower than Sample A, about as wide as 
Sample A, or wider than Sample A?” Alter- 
natively, consumers may be asked: “Is 
Sample A too narrow, too wide, or just about 
right?” The same type of question is asked 
concerning Sample B. 


UsrE OF THE NORMAL TABLE 


The following data are inserted to 
show the procedure for establishing the dis- 
tribution curve when consumers have been 
asked whether they desire food wraps with 
widths less than, about the same as, or 
larger than the two samples of 10 and 14 
inches. (These data have been altered from 
original to obtain permission to publish.) 

Sample A Sample B 


10-inch 14-inch 
width width 


Prefer less width : 63% 
Prefer about the same 5% 9 
Prefer more width 80°; 28% 


‘ 


The usual normal table records the area 
under a normal curve between one end of the 
curve and various points along the horizontal 
axis beyond the midpoint. The distances 
between these points and the midpoint of the 
curve are measured in terms of the standard 
deviation of the area of the curve or “sigma”’ 
as the metric unit. By looking up in the table 
the percentage of consumers in the area to the 
left of Product A and the percentage to the 
left of Product B, it becomes possible to fix 
Points A and B in terms of the number of 
sigmas between them and the midpoint of the 
distribution. The spread of the distribution 
curve along the horizontal axis for thickness 
of the food wrap is then apparent. 

The distance along the quality axis cor- 
responding to the 15% of respondents who 





CONSUMER PREFERENCES 


ie 


J 


Fa 
rs 
/ AS 
Afrefer narrower 
than A 


| 


43% 


A and B 








Prefer Between 


28% \ 
i 


Prefer wider 7 


a 8 ee 








8" 10" 


-1.041 | | 
-.94] | 


~.84 


Fic Boundaries of percentages 
report they prefer a food wrap with less 
width than Sample A is —1.04 sigmas from 
the midpoint of the distribution. This 
found from the normal table by noting that 
the distance corresponding to 85% + 1.04 
sigmas. The table involved (Burington, 
1958) does not list the horizontal axial 
distances for percentages less than 50 since 
these are the counterparts of the comple- 
mentary percentages greater than The 
distance of —1.04 sigmas from the midpoint 
is the distance for the boundary between the 
15% who say they prefer less width than 10 
inches and the 5% who wish a width about 
the same as 10 inches. In each case, the dis- 
tance along the axis for the 
boundary between the successive percentages 


is 


is 


50. 


quality is 
of consumers reporting their preferences. 

The distances in sigmas corresponding to 
under the normal curve up to the 
boundaries of the percentages of consumers 
reporting their preferences are marked on the 
curve in Figure 2. 

To the of those 
referring a food wrap narrower than A after 
the “about the same” group of respondents 
are split in their choice 
and more width, first the 
of the “about the same” 
halfway between —1.04 sigmas and —.84 
sigmas. Corresponding to the location 

.94 sigmas for the midpoint for the group 
desiring about the same width as Sample A, 
the area up to this point from the normal 


areas 


determine percentage 


less width 
midpoint in sigmas 


between 
HA as le ‘ 
group is located as 


of 


{ 


-33 
46 


14" 16" Width of Food Wrap 





58 


consumers reporting prelerences 

table is 17%. This is the percentage who 
are assumed to desire a food wrapper nar- 
rower than 10 inches rather than larger than 
10 inches if an absolute choice is imposed. 
Similarly the percentage actually preferring 
a food wrap narrower than 14 inches rather 
than wider than 14 inches is 68%, corre- 
sponding to .46 sigmas. Presumably, these 
are percentages of consumer replies which 
would be obtained if a forced-choice ques- 
tion were used in the survey. For questioning 
which is more acceptable to respondents, the 
category of “about the same” or “don’t know” 
is introduced in the interview. 

As a check upon the work, and in particular 
as a check upon whether the assumption 
of a normal distribution curve is justified, 
the replies from choosing between Samples 
A and B are used. These replies are recorded 

37% selecting Sample A, 51% selecting 
Sample B, and 12% giving “don’t know” re- 
plies. The boundaries between these groups 
as given by areas in the normal table are, in 
Figure 1, respectively, —.33 sigmas and —.02 
sigmas. The midpoint of —.18 sigmas be- 
tween these boundaries is the implied divid- 
ing line between those choosing Sample A 
and those Sample B in forced 
choice. In this case the result differs some- 
what from —.24 sigmas, which is the mid- 
point between —.94 sigmas and .46 sigmas 
from the first set of results. The discrepancy 
is not large enough to indicate much de- 
parture from normality. 


as 


choosing 





310 


SKEWED DISTRIBUTION 


CURVE 


ESTIMATION OF 


The skewed distribution curve can be 
estimated from the replies of consumers who 
choose between two samples and who also 
locate their preferences on one side or an- 
other of each of the samples. The extent to 
which the distribution curve width of 
food wrap is skewed depends upon the rela- 
tive numbers of consumers added to the 
cumulative total when the samples of food 
wrap become wider. If the food wrap 
becomes wider some consumers continue to 
be added per inch of successive width, the 
resulting curve would necessarily taper with 
skewness to the right. 

In the data under consideration, the 
cumulative number of consumers who wish a 
food wrap with a particular width or less is 
known for three points: 10 inches (Sample 
A), 12 inches (halfway between Samples A 
and B), and 14 inches (Sample B). Dividing 


for 


as 


Accumulated Percent 
1.00 .-— 7 >—- 




















PURNELL H. BENSON 


the no opinion groups in the reported data 
equally, the cumulative percentages of con- 
sumers for the three points are, respectively, 
17.5, 43, and 67.5. 

Graphs for skewed distribution curves 
were prepared which show the cumulative 
percentages of consumers who prefer a prod- 
uct with as much as or less than a given 
amount of a quality. These curves were pre- 
pared by multiplying the abscissas of the 
normal cumulative distribution curve by e* 
where x is the abscissa in standard deviation 
units and a is a constant which determines 
the degree of skewness. The curves are given 
in Figure 3. If a is zero, the skewness is zero 
and the distribution 
a = .05, skewness is 
12; for a 
skewness 
on: 


normal: for 
.10, skew- 
18; 


curve is 
.06: for a 


ness 5, skewness for 
a =.20, 


ness 


skew- 
38: 


for a BD: 


for a for 


AO, 


30, skewness 


a = .35, skewness 46: and for a 


skewness = .52. 












































Ogive 


curves tor 



































+ 























— ee ae a = 
1.2 1.4 16 18 2.0 2.2 2.6 26 2.8 3.0 
Deviation in Sigues of Normal Curve 


skewed distributions 





CONSUMER PREFERENCES 


Accumulated 


Percent 











+ +. 


A 





“1.6 <-1.2 -1.0 -+.8 -.6 -.4 


Deviation in Signas of 


Fic 


The graphs of the cumulative curves are 
used to find the best fit for the data of con- 
sumer replies. The horizontal axis of the 
curves is marked to indicate equal incre- 
ments in quality. The unit of measurement 
used is taken the standard deviation of 
the normal curve. To find out which one of 
the cumulative curves best fits the data for 
the cumulative percentages at three points 
of quality, two of which are equidistant from 
the third, horizontal lines are drawn on a 
sheet of tracing paper for the three cumu- 
lative percentages. The transparent sheet is 
then superimposed upon the cumulat*e 
curves to learn by inspection which of 
these curves more nearly intersects the nori- 
zontal lines at equidistant points along the 
horizontal axis. The procedure is illustrated 
in Figure 4. For paired comparison of three 
product samples not equally spaced in width, 


as 


-.2 


2 4 
Normal Curve 





4. Graphical procedure for fitting ogive curve to cumulative preferences 
I I 


the intersections with the horizontal lines 
would be at distances along the horizontal 
axis proportionate to the two intervals. in 
width between the boundaries dividing the 
three pairs of samples. 

For the food wrap data under considera- 
tion, the best fit is found between cumulative 
curves for a = .10 and a = .15. For a = .10, 
the quality points for A, midpoint, and B are 
found to —.84, —.16, and +.48. For 
a = .15, the quality points are found to be 
—.80, —.17, and +.49. By interpolation we 
estimate the proper value of a to be .13 and 
the corresponding skewness as .16. 

The possibility of an irregular shaped 
distribution with more than one mode may 
arise when there are multiple uses of a prod- 
uct. The distribution obtained from two 
three sample points should be checked with 


be 


or 


the aid of additional product samples unless 





PURNELL H. BENSON 








Fic. 5. Illustration of product locations along a quality dimension. 


an initial estimate of the character of the 
distribution is all that is required. 

UsE OF THE CONSUMER DISTRIBUTION 

CURVE 

For heterogeneous consumers described by 
the distribution curve, the question arises 
of what product form or forms can be mar- 
keted for maximum sales return. Theoretical 
assumptions for using distribution functions 
to estimate proportions of consumer choices 
are contained in the work of L. L. Thurstone 
(1959). Ordinarily, the distribution function 


for a single quality dimension provides in- 
sufficient information by itself for concluding 
what product forms should be offered for 


maximum satisfaction to consumers. Dis- 
tributions of consumer preferences for other 
qualities are also involved. 

Considering a single quality dimension, 
comments may be made which indicate the 
nature of other problems involved than that 
of multidimensionality. If only one width of 
food wrap is to be prepared, and consumers 
not satisfied by that width object as little to 
a food wrap too wide as to one too narrow, 
then a single product form would be located 
at the center or mode of the distribution 
curve where the largest number of consumers 
have their optimum. If some consumers can 
only use a minimum width of food wrap the 
single food wrap should be at least as wide 
as the minimum for acceptance by them. 

If more than one product form is to be 
made available two kinds of question are 
involved. One of these involves the decision 
of where on the quality dimension to locate 
the particular quality forms, if it is econom- 


ical to manufacture and sell more than one. 
The other type of question is that of deter- 
mining whether sales return is sufficient to 
justify the manufacture of an added product 
form. Answers to the latter question involve 
cost accounting peculiar to different product 
classes. Concerning the location of different 
product forms along a single quality dimen- 
sion, if it is economical to provide them, 
some comments with general application may 
be made referring to Figure 5. 

If a second Product Form B enters the 
market already served by Product Form A, 
then, other things equal, the new brand 
may be expected to receive a share of the 
market equal to the area under the curve to 
the right of a line between A and B. The 
rest of the market will go to A. Whether the 
dividing line is midway between A and B 
depends upon whether too much and too 
little of the quality are equally acceptable. 
If a food wrap cannot be used when its width 
is too small, the dividing line will be nearer 
the narrow sample. This can be illustrated 
by Figure 6. In this figure the dotted curve 
represents the drop in acceptance of greater 
width or less width for an individual user, X, 
who is indifferent between Food Wraps A 
and B. This is a user whose individual ac- 
ceptance curve drops by equal amounts for 
the choice of either A or B. Any users to 
the right of X will prefer B to A, and any 
users to the left of X will prefer A to B. 
The individual acceptance curve can be 
learned by observing the changes in like or 
dislike for changes in width and fitting a 
function to the experimental data (Benson, 
1957). The peakedness or flatness of the 





CONSUMER PREFERENCES 


5 
~~ 
~ 


Distribution 
Curve 


Acceptance 


+ 
urve 


wU 


~ 
~ 
. 
‘ 

















p X 
A A 


2) 


6. Relationship of product choices to product acceptance 


acceptance curve is important in evaluating 
how strongly consumers feel about variations 
in product quality and whether segments 
of the market will respond by buying a new 
product form. 

Referring again to Figure 5, if Product B 
changes its quality and moves closer to A, 
an increasing share of the market will be 
reached by B. If a third product form is 


added at C, midway between A and B, it will 
have a potential market equal to the area 


between the midpoints of A and C, and C 
and B. In market strategy, if Products A 
and B move in closer to C, then C will be 
squeezed in its share unless the products 
cannot be distinguished by consumers or 
unless brand loyalty or other qualitative ap- 
peals continue to hold consumers to C. If 
this squeeze should be threatened, Product C 
may find it advantageous to shift its loca- 
tion. These comments indicate some of the 
possibilities for planning and marketing new 
product forms after consumer distribution 


curves have been established through re- 
search and existing brands have been located 
on the axes of these distributions. 


REFERENCES 

Benson, P. H. A model for the analysis of consumer 
preference and an exploratory test. J. appl. 
Psychol., 1955, 39, 375-381 

Benson, P. H. Optimizing product acceptability 
through marginal preference analysis. In, Quality 
control and the consumer conference. New Bruns- 
wick, N. J.: Rutgers Univer. 1957. Pp. 
67-94 

Benson, P. H., & Peryam, D. R. 
foods in relation to cost. J. 
1958, 42, 171-174 

Benson, P. H., & Prtcrim, F 
able product possibilities. J 
65-68. 

Burincton, R. S. Handbook of mathematical tables 
and formulas. Sandusky, O.: Handbook, 1958. 
THurstone, L. L. The measurement of values. 
Chicago: Univer. Chicago Press, 1959. 
Torcerson, W. S. Theory and method 
New York: Wiley, 1958. Pp. 191-194. 


Press, 


Preference for 
appl. Psychol., 


J. Testing less desir- 
Market., 1961, 25, 


of scaling. 


(Early publication received February 16, 1962) 





Journal of 


1962, Vol 


ipplied Psychology 
46, No. 5, 314-316 


SELF-DESCRIPTION AS A PREDICTOR OF RATE 
OF PROMOTION OF JUNIOR FOREIGN 
SERVICE OFFICERS 


REGIS H. WALTHER 


Department of State 


An experiment to determine 


Service Officers 


were divided first into an experimental group (N 


group (N 
as of January 


relationship between 
interests, and background questionnaire, and rate 
Ss, who had completed an 89-item questionnaire May 


Social Isolation one proved to be useful 


attitude, 
Foreign 
1958, 


responses 
ol 


to 
promotion 


an 


of 


20) and a cross-validation 


49), and then into high and low groups on the basis of promotions 
1961. Of the scales developed from previous studies, only the 


However, 2 new elements, Optimism 


and Self-Potency, proved to be effective measures for predicting the criterion 


A correlation of .60 was found 


between 


the combined scores on these 


elements and speed of promotion for the cross-validation group 


Walther (1961b) reported a study showing 
that given by Foreign Service 
clerical personnel to questions about attitudes, 
interests, activities, and family background 
could be used to predict success or failure in 
various clerical jobs in the Foreign Service. 
The questionnaire was revised in an effort 
to measure more accurately the behavioral 
style and capability elements which emerged 
from this study. The present study 


responses 


was 


designed to determine whether responses given 
by junior Foreign Service officers could be 
used to predict rate of promotion. 


METHOD 


In May 1958, 69 junior Foreign Service officers 
completed the revised 89-item multiple-choice ques- 
tionnaire including questions relating to grades and 
interests in characteristics most liked 
or disliked, relationship with parents, social activi- 
ties, steadiness of employment, hobbies or outside 
Examples of additional questions are: 


S¢ hool, 


job 


interests, etc 
Do you feel that you are left out of things, perhaps 
intentionally, in group activities? 


a. Never 

b. Seldom 
c. Sometimes 
d. Frequently 
e. Almost 


always 


you 


Work 
tight deadlines 
Prefer to work 
able to work 


best under a great deal of pressure and 


at 
well under 


an even pace but are also 


pressure 


314 


c. Prefer not to work but 
able to meet most 
Do your worst 


is put on you 


under pressur¢ are 
reasonable deadlines 


work if unreasonable pressure 


your political, religious, and social views: 


Almost identical with those of your parents 
Similar to those of your parents 
Different in some important respects from those 
of your parents 
Very substantially different from those of your 
parents 

e. Different in 
from those 


almost 
ol 


every important 
parents 


respect 


your 


All the junior Foreign Service officers attending 
training classes at the Foreign Service Institute 
completed the questionnaire during May 1958. 
Twenty of them had entered on duty during the 
summer of 1956 and 49 during the spring of 1958 
The subjects were then divided into high and low 
groups. For the officers hired in 1956 the high 
group consisted of officers promoted twice by 
January 1961 and the low group of all the remain- 
ing officers. For the officers hired in the high 
group consisted of officers promoted once by Janu 
ary 1961 and the low group of officers who had not 
been promoted or who had resigned 
were eliminated the study. One 
officer resigned to get married 
on military furlough 
behavioral style and capabality eleme 
emerged from the study 


1958 


Two subjects 
trom was a woman 
and the 

keys for 
nts which had 
Foreign Service clerical 
jobs and the study of six occupational groups were 
applied (Walther, 1961a, 1961b). Only the Social 
Isolation key developed useful differences (see 
Table 1). The next step was to try to account for 
the other statistical difference which had been found 
between high and low categories for the 1956 group 
It was hypothezised that 
and Self-Potency) 


other 


the 


who 


was Scoring 


of 


two elements 
account tor 


(Optimism 


would most of the 





SELF-DESCRIPTIONS AND RATE OF 


PROMOTION 


rABLE 1 


COMPARISON OF SCORES ON 


1956 group 
Low 


V SD 
Score (N =10) 
Optimism 
Self-Potency 
Social Isolation 


Combined score* 


* Scores were combined 
for every point below 
* D<.05. 


** b< Ol 


point 


difference and items 
on the basis of their 
hypothesis. An item was made and items 
which failed to correlate with the total 
eliminated. The three scales were then combined so 
as to create the maximum numerical spread between 
the high and low categories for the 1956 
The results are contained in Table 1 and 
correlations of the scales in Table 2 


and weighted 
relationship to the 


were selected 
logical 
analysis 


scale were 


group 
the inter 


TABLE 2 


INTERCORRELATIONS AMONG ELEMENTS 


Social Self 


Ioot ' 4 
score Potency 


Isolation Optimis 
Social Isolation 11 


Self-Potency 


The criterion used for measuring success was pro- 
motion rate. The promotion system used for Foreign 
Service officers involves a yearly 
sonnel files of 


review of the per 
all officers in the class by a panel of 
four experienced officers each of whom reviews each 
file which contains supervisory 
Inspectors and end-users, 
or reprimands. A_ written 
Panel member of the promotion 
are to be used. Promotions are 
order in which the 
motion Panel 


ratings, ratings by 


plus any commendations 
given to 
standards 
then made in the 
ranked by the Pro- 


guide is each 


which 


officers are 


RESULTS 

The Self-Potency element gave the most 
consistent results for both the 1956 and 1958 
groups: The Optimum element was significant 
for the 1956 group and close to significant 
for the 1958 group. A biserial correlation of 
.64 for the 1956 group and .60 for the 1958 
group was found between the combined score 


INDIVIDUAI 


> 


ELEMENTS TO RATE OF PROMOTION 


1958 group 


High 


Low 


V - 
(N=15 


for the three elements (Self-Potency, Opti- 
mum, and Social Isolation) and promotion 
rate. 


DISCUSSION 


Foreign Service officers are a _ carefully 
screened group and are selected on the basis 
of both a written and oral examination. Only 
about 5% of the applicants are offered ap- 
pointments. It can be expected, therefore, 
that this occupational group will be unusu- 
the study comparing 
officers to stenographers, 
nurses, policemen, physicists, and ministers 
it was found that the Foreign Service officers 
scored high on the Social Leadership and the 
Influence Others (Walther, 1961a). 
The group included in this study was also 
very high on these elements. In response to 
the question “What do you like best in 
work?”’’, 79% said work through which they 
can influence others, 10% work involving 
competition, 1% work requiring accuracy in 
fine detail, 3% steady work without frequent 
interruptions, and 7% work through which 


ally homogeneous. In 


Foreign Service 


scales 


they can please others. In response to the 
question “If you were asked to be an of- 
ficer in an organization would you prefer to 
be :” 


75% said president, 18% vice president, 
1% secretary, 5% treasurer, and 1% prefer 
not to be an officer. Only 4% say they almost 
never entertain groups at home. It seems 
likely, therefore, that the reason the Influ- 
ence Others and Social Leadership scales did 
not predict promotions was because of the 
great restriction in the range. 





316 Recis H 

The following assumptions have been de- 
veloped regarding the specific meaning of the 
two elements which seem to predict promo- 
tions of Foreign Service officers: 

Optimism. This element relates to the de- 
gree to which the individual has a conviction 
that the intentions of other people are 
basically benevolent, and that satisfaction 
can be easily obtained from many things, as 
opposed to the conviction that other people 
cannot be fully trusted and that it is difficult 
to get satisfaction from the world. Examples 
of positive items defining the element are: 
considers himself a lucky individual, always 
has something to do during his spare time, 
almost all people like him, and he is never 
intentionally left out of group activities. 
Examples of negative items are: he usually 
has not had a supervisor who praises him 
and gives him credit for work which he does 
well, his jobs usually have not been interest- 
ing, and his supervisors for the most part 
have not been usually helpful and under- 
standing. Perhaps this element is related to 
Cattell’s Surgency trait (Cattell, 1946). 

Self-Potency. This element relates to the 
degree that the individual has a conviction 
that it is possible to influence the outcome 
of events and that significant satisfactions 
can be obtained from making the effort, as 
apposed to a conviction that it is not worth 
while to try to influence the outcome of 
events because he feels he cannot have much 
influence, because he feels that no significant 
satisfaction can be obtained from making the 
effort, or he feels that he can receive ample 
satisfaction without making the effort. Exam- 
ples of positive items are: the major satis- 
faction he gets from playing cards or similar 
competitive games is the opportunity to play 


well; the opportunity to understand just how 


. WALTHER 


one’s superior expects work to be done is 
less important than some other choice in- 
cluding freedom in working out one’s own 
methods of doing the work; as an adolescent 
he disagreed occasionally or frequently with 
parents on political, religious, social, or other 
issues; and he feels that in dealing with other 
people it is important to avoid being diverted 
from doing what is right in order to please 
someone. Examples of negative items are: 
was obedient toward his parents as an ado- 
lescent; considers competent co-worker less 
important than other choices, including 
courteous treatment from superiors and cer- 
tainly one’s work will be judged by fair 
standards; and likes best a supervisor who 
gives him clear-cut instructions and is always 
available for advice. Perhaps this element is 
related to the Achievement Motive (McClel- 
land, Atkinson, Clark, & Lowell, 1953). 

It should not be assumed that the com- 
bined score on the three elements shown in 
Table 1 is primarily a measure of effective 
performance. It may be a measure of factors 
which bring officers to the early attention of 
Promotion Panels. It will be interesting to 
see if these scores continue to predict speed 
of promotions over a longer period of time. 


REFERENCES 


Catrett, R. B. Description and measurement of 
personality. New York: World Book, 1946 

McCrettanp, D., Atkinson, J. W., Crark, R. A., 
& LoweLL, E. L. The achievement motive. New 
York: Appleton-Century-Crofts, 1953. 

Wattuer, R. H. Self-description and occupational 
choice. Paper read at American Psychological 
Association, New York, September 4, 1961. (a) 

Wattuer, R. H. Self-description as a predictor of 
success or failure in foreign service clerical jobs. 
J. appl. Psychol., 1961, 45, 16-21. (b) 


(Received May 9, 1960) 





Journal of Applied Psychology 
19¢ \ 46. No. 5, 317-320 


PREDICTION OF COLLEGE ACHIEVEMENT FROM THE 
EDWARDS PERSONAL PREFERENCE SCHEDULE 
AT THREE LEVELS OF INTELLECTUAL ABILITY 


LEONARD D. GOODSTEIN ano ALFRED B. HEILBRUN, JR.! 


University of Iowa 


The scores on the Edwards Personal Preference Schedule (EPPS) were cor- 
related with the semester grade point average on a sample of 357 under- 
graduates, 206 males and 151 females, with the variance attributable to a 
brief vocabularly test estimate of scholastic ability partialled out. While 
the results of the analysis of the total male and female groups were essentially 
negative, further analyses which followed a subdivision of each of the 2 
sex groups into low, middle, and high ability groupings yielded more promising 
results, especially for the middle ability male subgroup. Following a comparison 
of the obtained results with previous studies, the importance of using levels 
of intellectual ability as a control variable in studies of nonintellectual factors 


in achievement was noted 


The prediction of scholastic success in 
academic institutions has been a perennial 
problem for the applied psychologist. The 
initial research with tests of scholastic apti- 
tude or intelligence as predictors of academic 
achievement has made it clearly evident that 
such tests of intelligence are quite useful for 
this purpose. At the same time it is also 
clear that predictions based upon measures 
of scholastic aptitude are far from perfect 
and, indeed, typically such measures account 
for less than half of the variance in academic 
performance. 

Recently there has developed some research 
interest in nonintellectual factors, especially 
personality variables, as an additional rele- 
vant source of variance in the prediction of 
academic achievement. The Edwards Per- 
sonal Preference Schedule (EPPS) would ap- 
pear to be especially useful in this respect 
as many of the personality factors or “needs” 
tapped by this instrument are logically re- 
lated to academic success, e.g., Achievement, 
Endurance (Edwards, 1959). 

Klett (1957) has used the EPPS to pre- 
dict academic success in a large, unselected 
high school sample with rather equivocal re- 
sults, while both Gebhart and Hoyt (1958) 
and Krug (1959) have studied the usefulness 
of the EPPS in understanding the special 

1 The authors are most appreciative of the assist- 
ance of Dee W. Norton in arranging for the sta- 
tistical treatment of the data. 


problems of over- and underachievement in 
college. The latter two studies yielded con- 
sistent positive results suggesting that the 
EPPS may have more predictive value with 
groups of college or university students than 
with high school students. However, an im- 
portant question is whether a personality 
measure such as the EPPS can contribute to 
the prediction of academic success in an un- 
selected group of college students. Although 
over- and underachievement are important 
aspects of college success, the present study 
is concerned with the contribution of the 
EPPS to the prediction of academic achieve- 
ment over the entire range of scholastic 
ability in undergraduate college students. 


PROCEDURE 


The 357 undergraduate students, 206 males and 
151 females, enrolled in two large elementary psy- 
chology courses at the State University of Iowa 
took the EPPS early in the semester under standard 
conditions “for research purposes.” The group was 
predominantly sophomores, with some juniors and a 
few seniors. The score on a 20-minute, 60-item 
vocabulary test taken upon entrance to the uni- 
versity was available for each student as a measure 
of scholastic aptitude. Previous unpublished re- 
search 2 has indicated that this vocabulary test is a 
statistically reliable predictor of academic achieve- 
ment (r’s with freshmen grades ranging from .45 


2The authors acknowledge their indebtedness to 
Arthur Mittman, Director, University Examination 


Service, University of Iowa, for these 


unpublished data available. 


making 





LEONARD D. GOODSTEIN 


AND ALFRED B. 


HEILBRUN, Jr. 


TABLE 1 


CoRRELATIONS (with intelligence partialled out) BETWEEN GRADE PoINT AVERAGE AND THI 
SCHEDULE 


OF THE EDWARDS PERSONAL PREFERENCE 


15 SCALES 


(EPPS) OBTAINED IN THE TOTAL 


AND EACH OF THE ABILITY SUBGROUPS 


Males 


Middle 
ability 
group 


(V=69 


Low 
ability 

group 
(N =68) 


Total 

group 
EPPS scales (NV =206) 
Achievement .24** 18 .29* 
Deference 04 16 03 
Order .06 13 10 
Exhibition 03 01 09 
Autonomy 1 -.27* 17 
Succorance .06 02 2 
Affiliation 06 -.11 26* 
Intraception 04 25* 
Dominance 02 05 09 
Abasement 2 16 
Nurturance a 
Change ; Bs 
Endurance .48** 
Heterosexuality zi 05 


Aggression ; 2 00 


*p <.085. 
*® > <.01 


and, 
between the 


have been obtained) 


correlation 


with Ns 500 
further, has 
vocabulary test 


to 55 
shown the 
scores and scores on the revised 
Henmon-Nelson Test of Mental Ability, a more 
commonly used measure of scholastic aptitude, to 
be relatively high (r 70, N = 500) At the con- 
clusion of the semester the grade point average 
(GPA) for each student for all courses that semester 
was secured from the University Registrar 


RESULTS 


The product-moment correlations between 
the semester GPAs, the vocabulary test 
scores, and scores on the 15 personality vari- 
ables of the EPPS were computed separately 
by sex for the total group. The obtained 
correlation between vocabulary test 
and GPA was .46 for the males and .42 for 
the females (both p’s < .01). The correla- 
tions between the vocabulary test scores and 
the EPPS scale scores ranged from —.23 to 
.37. Approximately 20% of the correlations 
between the vocabulary test scores and EPPS 
statistically 


scores 


scale scores were significant 


(i < G5). 
The partial correlations between the 15 


Females 


Middle 
ability 


High 


ability 


Low 
ability 
group 


(V=50 


High 

ability Total 
group group 
(VN =68) (N=151 


group grou] 


(V=50) (N=51 


20 )7 01 16 
05 : 07 

02 - —.15 » 18 
01 : Os nF 
16 

11 

10 

15 

13 


Noor. = 
mean hw we 


EPPS scales and GPA, with the variance 
attributable to the relationship with the 
vocabulary estimate of intelligence partialled 
out, were then computed by sex for the total 
group. The resultant partial correlations for 
the two samples are presented in Table 1. 

For the total groups, Achievement is posi- 
tively correlated (p < .01) with GPA for the 
males while none of the partial correlations 
for the females is significant. 

These essentially with 
students of a wide range of intellectual abil- 
ity gave rise to the possibility that person- 
ality correlates of college achievement may 
be specifically tied to ability levels and that 
these relationships are masked in the type of 


negative results 


analysis which fails to consider levels. For 
example, personality traits which tend to en- 
hance or interfere with college achievement 
in a low ability student may differ from such 
enhancing or interfering traits in the middle 
or high ability student. To investigate this 
possiblity, each of the two sex groups was 
subdivided into three equal sized subgroups, 





PREDICTION OF COLLEGI 
with scores on the vocabulary tests * defining 
Low, Middle, and High intellectual ability 
groups. 

The product-moment correlations between 
GPA, vocabulary test scores, and the 15 
EPPS scores were then computed for each 
of the six subgroups. For the males, the 
obtained correlation between the vocabulary 
test scores and GPA was .28 for the Low 
ability group, .11 for the Middle ability 
group, and .33 for the High ability group 
while for the females the correlations were 
32, —.05, and .40, respectively. The corre- 
lations between the vocabulary test scores 
and the EPPS scale scores ranged from —.32 
to .42 with approximately 25% of the cor- 
relations statstically significant (p < .05). 

The partial correlations between the 15 
EPPS personality and GPA, with 
intelligence partialled out, for each of the six 
ability subgroups were then computed and 
are also presented in Table 1. 

For the Low ability males, Autonomy and 
Nurturance are negatively correlated with 
GPA; for the Middle ability males, Achieve- 
ment and Endurance are positively correlated 
with Affiliation, Intraception, Nurturance, 
and Change negatively correlated with GPA; 
and for the High ability males, Aggression is 
negatively correlated with GPA. For the Low 
ability females, Abasement and Nurturance 
are negatively correlated with GPA; for the 
High ability females Intraception is positively 
correlated with GPA, but none of the partial 
correlations for the Middle ability female 
group is statistically reliable. 


scores 


DISCUSSION 


The results of the present study offer 
some support for the notion that personality 
factors are significantly related to academic 
achievement when the influence of academic 
ability is statistically removed, but that the 
nature of the relationships depend upon the 

An inspection of the vocabulary test scores for 
the total group suggests that the score distribution 
is not different from that of the university’s norma- 
tive group. Further, inspection of the EPPS scale 
means reveals them to be similar to thos« 
reported by Edwards (1959) for students 
while a view of the transcripts of the 
subjects suggests nothing atypical was involved in 
the GPAs obtained 


quite 
college 
academic 


ACHIEVEMENT AT THREE 


LEVELS 319 
general ability level of the group being 
studied. Thus, when heterogeneous ability 
groups are studied and levels of ability are 
ignored as a variable, the true relationships 
between personality factors and achievement 
may be concealed. The different relation- 
ship between the personality measures and 
achievement at the three levels of ability is 
rather striking. For example, Endurance is 
positively (ry = .48) and significantly corre- 
lated with achievement in the Middle ability 
male subgroup but these variables are insig- 
nificantly negatively correlated (r = —.03) 
in the High ability male subgroup. Abasement 
is negatively correlated (r with 
achievement in the Low ability female sub- 
group and positively correlated in the other 
two female ability although 
neither of the latter correlations is statis- 
tically reliable. Thus, it should not be sur- 
prising that the obtained correlations between 
Endurance and Achievement for all males and 
between Abasement and Achievement for all 
females are approximately zero. The depend- 
ence of personality trait-college achievement 
relationships upon ability level is also clearly 
suggested by the fact that in only one in- 
stance out of 11 was a given personality 
correlate of achievement found at more than 
one ability level. 

If the relationships between personality 
measures and academic achievement depend 
upon the general level of intellectual ability, 
the failure to investigate by levels of ability 
may result in spuriously negative findings. 
Klett’s (1957) inability to find stable EPPS 
correlates of 


— 38) 


subgroups, 


academic achievement in high 
school groups who are even more heterogene- 
ous in ability than the group involved in the 
present study may possibly be explained in 
this fashion. The importance of studying 
individuals at one homogeneous level of in- 
tellectual ability is further illustrated by the 
recent study by Holland (1959) who found 
many academic 


personality correlates of 


achievement in a very high ability sample of 


college students, using the California Psy- 
chological Inventory as his measure of per- 
sonality. 

The present results for the Middle ability 
males are rather like those reported by both 
Gebhart and Hoyt (1958) and Krug (1959), 





320 


although neither of these studies reported 
correlational results. Achievement is posi- 
tively related and Affiliation negatively re- 
lated to academic success in all three studies. 
Gebhart and Hoyt found Nurturance and 
Change to be negatively related to achieve- 
ment while Krug found that Endurance is 
positively related to achievement; these re- 
sults are also consistent with those provided 
by the Middle ability male Ss in the present 
study. The only noteworthy inconsistency is 
the EPPS Order scale, which is unrelated to 
achievement in the present study but posi- 
tively related in both of the earlier studies. 

One possible general interpretation of the 
present findings is that personality factors 
are most important in determining the aca- 
demic achievement of the average ability 
college male. While all males may have a 
special stake in college achievement, the suc- 
cess of relatively bright and dull males is 
more determined by intellectual factors than 
is the case with the average ability males. 
In this group of average ability males, intel- 


LEONARD D. GOODSTEIN AND ALFRED B. HEILBRUN, Jr. 


lectual factors are less predictive of success, 

and personality factors are the more impor- 

tant determiners of actual academic success 
and failure. Obviously additional research 
will be necessary to test the appropriateness 
of this interpretation. 

REFERENCES 

Epwarps, A. L. Manual for the Edwards Personal 
Preference Schedule. (Rev. ed.) New York: Psy- 
chological Corporation, 1959. 

Gesuart, G. G., & Hoyt, D. P. 
for under- and overachieving 
Psychol., 1958, 42, 125-128. 

Hoiianp, J. L. The prediction of college grades 
from the California Psychological Inventory and 
the Scholastic Aptitude Test. J. educ. Psychol, 
1959, 50, 135-142. 

Kiett, Suirtey L. The Edwards Personal Prefer- 
ence Schedule and academic achievement. Unpub 
lished doctoral dissertation, University of Wash- 
ington, 1957. 


Personality needs 
freshmen. J. appl. 


Kruc, R. E. Over- and underachievement and the 
Edwards Personal Preference Schedule. J. appl 
Psychol., 1959, 43, 133-136 

Received 


August 5, 1960) 





Journal of Applied Ps 
1962, Vol. 46, N 


cholog. 
321-324 


ORAL COMMUNICATION AND SALES EFFECTIVENESS’ 


R. WAYNE PACE 

Fresno State College 
The relationship between oral communication and sales effectiveness was 
investigated. 2 equatable groups of working sales people, a more effective 
“high” group and a less effective “low” group, were compared in terms of 
selected aspects of their communication behavior. Findings indicated that 
evaluations of oral communication skill (including listening) reliably 
differentiated between the 2 groups. Sales methods such as using “emotional 
appeals” and “dramatizing” distinguished between the more effective 
and the less effective sales people. Results suggested: that evaluations of basic 
oral communication skill by a trained interviewer should be one valuable 
indicator of sales that communication training programs should 
subordinate fragmental, relatively isolated details of communication behavior 
to consideration of generalized communication skill; and that sales people 
who are inferior in basic oral communication skill will also be less effective 


basic 


also 


SUCCESS ; 


in utilizing specialized persuasive techniques. 


In this age of distribution, marketing 
methods have taken on vast changes in the 
constant struggle to get products to the con- 
sumer in the quickest and most economical 
way possible. Revolutionary new methods are 
being employed while traditional sales tech- 
niques are being given a new look. A desire 
to accelerate distribution has prompted com- 
panies to review closely one of the oldest, 
time-honored methods of distributing goods, 
variously called door-to-door, person-to- 
person, or direct selling. 

The salesman engaged in direct selling 
finds himself involved in a surprisingly com- 
plex interpersonal relationship. To be suc- 
cessful, he must so communicate with his 
prospect that he is able to evoke an almost 
immediate change in behavior. Nirenberg 
(1958) emphasizes the relationship between 
oral communication and sales effectiveness, 
when he states: ‘““Managing and selling can 
be no better than the face-to-face communi- 
cation involved in them.” 


PURPOSE AND SCOPE 


The study on which this report is based 
investigated two questions: whether there are 


1 This paper is. based upon a dissertation sub 
mitted to the Graduate School, Purdue University, 
in partial fulfillment of the requirements for the 
PhD degree. The dissertation was written under the 
direction of W. Charles Redding. 


2 Formerly, Instructor in Speech, Purdue Uni- 
versity; now, assistant Professor of Speech, Fresno 
State College. 


that reli- 
selling of 
people, and 


attributes of oral communication 
ably distinguish between the 

“better” and “poorer” 
whether there are some attributes of oral 
communication that seem to be character- 
istic of all sales people in general (regardless 
of degrees of success). This article is limited 
to a discussion of the methods and only the 
most salient results and conclusions con- 
cerned with attributes which differentiated 
between the more effective and the less effec- 


tive sales representatives 


sales 


METHODS AND PROCEDURES 


Two equatable groups of 
differentiated into a more effective “high” group 
and a less effective “low” group by means of a Sales 
Effectiveness Index (SEI), were compared in terms 
of selected aspects of their communication behavior. 
The SEI criterion was computed by dividing the net 
dollar value of sales by 
selling 


working sales people, 


hours devoted to active 


Net dollar value of sales 


Hours devoted to active selling 
\ neutral third party computed the SEI in order 
to avoid possible bias on the part of the investi- 
gator in the perception and recording of responses 
and evaluations, which might have occurred had he 
known which subjects were assigned to the high 
and low groups. The neutral party retained all 
“criterion” data until the “raw” data had been 
gathered, coded, and _ tabulated for _ statistical 


8 Additional details concerning attributes of oral 
communication characteristic of the sales representa- 
tives as a group may be obtained by corresponding 
directly with the author 





R. WAYNE 


PAC} 


TABLE 1 


SUMMARY OF INTERVIEWER’S RATINGS OF 


Item 


Overall impression ® Rated 5 or higher? 

Summated scores from six individual items :! 
Use of voice :* Rated 5 or higher? 

Use of language © Rated 5 or higher e 
Eye contact and bodily behavior : 
Quality of listening :* Rated 5 or higher? 
Personal attitudes :* Rated 5 or higher? 


Initial impression :* Rated 5 or higher? 


* Based on a 7-point scale 
Based on a range: 6-54 points 
| on a 9-point scale 
5, one-tailed test 
one-tailed test 


analysis. A ¢ test (Guilford, 1956), run between 
the mean SEIs of the two groups, yielded a value 
significant beyond the .02 level of confidence 
The subjects studied were all engaged in personal 
house-to-house selling for a nationwide retail 
organization, with their prospective customers nor- 
mally They 
(usually themselves 
time residents of a fairly small and 
homogeneous area in the Midwest. Their selling ac- 
tivities were customarily characteristic of “specialty” 


sales 


were all women 
selling on a part- 


being housewives 
housewives 


basis) and 


sales people 
Most of the 
means oO! 


gathered by 
interviews and a 
oral communication evaluation form. The data were 
codified into equivalent, dichotomized categories 
(e.g., “yes-no”). The categorized data were then 
analyzed to examine the extent to which the findings 
were significantly different for the two groups. Chi 
square (Guilford, 1956), corrected for continuity, 
was the procedure chosen for this The 
criterion of 05 level 
tailed test 


primary data were 


oral, face-to-face basic 


analysis 
statistical significance was the 
of confidence throughout, using a one 


RESULTS 


In assessing the relationship of oral com- 
munication sales effectiveness, 
each of the subjects was rated on six sepa- 
rate factors of oral communication (voice, 
language, bodily behavior, listening, personal 
attitudes, initial impression) and on a 
“gestalt” or “overall impression.” These 
ratings were made by the investigator on 


behavior to 


the basis of his direct, personal observations 
of respondent behavior in two separate inter- 
views. The results of these ratings are sum- 


RESPONDENT 


lotaled 35 or higher 


Rated 5 or higher? 


* ORAL COMMUNICATION SKILLS 


Frequency 


High Low 
Chi 
square 


Yes Yes 

esr" 

4.544** 
483 

4.803** 
1.101 
052 
.022 


1.960 


Table 1. The Item column con- 
tains brief descriptive phrases identifying the 
factors being evaluated. The Frequency 
columns contain figures representing the 
numbers of subjects in the high and low 
groups, respectively, who responded or were 
evaluated by the interviewer affirmatively 
and negatively on each item. 

Item 1, “overall impression” of communi- 
skill, has a highly significant chi 
indicating that this admittedly sub- 
evaluation distinguished reliably be- 
tween the high and the low group of sales 
people. This generalized impression of com- 
munication skill apparently has a relation- 
ship to effectiveness in direct 
investigated in this study. 


marized in 


cation 
square, 
jective 


selling, as 


Numerical ratings of the six specific factors 
were summed. Using an cutoff 
(out of a maximum 54), a sig- 
nificant difference between the two groups 
occurred. Feeling that an artifact may have 
been introduced by such an arbitrary de- 
cision, the means of the summated ratings 
were computed for the two groups; a ¢ test 
was run to analyze the difference between 
the means. Based on a range of 6—54 points, 
the mean for the high group was 37.4 and 
the mean for the low group 29.4. The result- 
ing ¢ value of 2.232 was significant at the 
.025 level of confidence. Thus, both methods 
of computation (the cutoff 


arbitrary 
score of 35 


V 
3 


score and the 





ORAL COMMUNICATION AND SALES EFFECTIVENESS 


means) indicated real differences between the 
two groups. 

In addition, a rating of five or higher on 
the single factor, “use of language,” was also 
found to be statistically significant (at the 
025 level). Other factors taken by them- 
selves, however, were not found to differenti- 
ate significantly between the highs and the 
lows. 
the of the face-to-face 
interviews, respondents were asked to give 
their opinions about whether they felt they 
employed the following persuasive tech- 
niques: sales arguments, emotional appeals, 
showmanship, and dramatization. Through- 
out the interview, the investigator probed the 
representatives for definitions, explanations, 
and illustrations taken from their actual sales 
experiences concerning the use or nonuse of 
these techniques. Then, on the basis of the 
descriptions, the interviewer drew inferences 
about the respondents’ use of the techniques. 
Results on these responses and evaluations 
are summarized in Table 2. 

Two of the eight items differentiated sig- 
nificantly between the highs and the lows: 
Item 12, “interviewer’s inference about use 
of emotional appeals,” and Item 16, “inter- 
viewer’s inference about of dramatiza- 
tion” in sales presentations. The more ef- 
fective sales representatives were much more 
likely to use “emotional appeals” and 
“dramatization” than effective 


During process 


use 


were less 


representatives. 


rABL 


SUMMARY OF 


Do you use sales arguments? 
Interviewer’s inference about use 


Do you use emotional appeals? 
} Vt 


Interviewer’s inference about use of emotional appeals 


Do you use showmanship ? 
Interviewer's inference about use of showmanship 
Do you use dramatization? 


Interviewer’s inference about use 


RESPONSES CONCERNING 


of sales arguments 


of dramatization 


These techniques are potentially similar in 
that dramatization probably involves the 
use of some emotional appeal. For example, 
requests to buy on the basis of appeals other 
than performance are generally referred to 
emotional. That considerations of 
beauty, style, popularity, are commonly 
called emotional elements. Dramatization 
usually refers to the act of painting verbal 
pictures for customers in an attempt to help 
them see (mentally) themselves enjoying the 
benefits of the product for sale. Thus, paint- 
ing a picture of the beauty of a product 
would be dramatization using emotional ap- 
peals. Interestingly, use of language (Table 1, 
Item 4) also differentiated between the two 
groups. Colorful, vivid, picturesque language 
would be, of course, the vehicle by which 
emotional elements are dramatized. Hence, 
although these three items were recorded 
independently (separated by an hour or 
more), there may have been some “halo” 
factor operating. However, the differentiating 
value of the dramatization, emotional appeal, 
language items suggests that a real relation- 
ship may exist between effective direct selling 
and this cluster. 


as 1S, 


In oral communication the voice plays an 
important role. No significant differences were 


found between the two 
viewer's ratings) in the in which 
respondents used _ their (Table 1, 
Item 3). A large majority of both groups 


affirmed, however, that a good voice makes 


groups 
manner 


(by inter- 


voices 


E 2 


SoME TECHNIQUES OF PERSUASION 


Frequency 


High 
Chi 
square 


000 
020 
070 
3.343" 
107 
2.340 
2.047 
4.803* 





324 


a difference between a superior and an in- 
ferior sales person (17 of 20 in high group 
and 13 of 17 in low group). Nevertheless, 
the highs reported, significantly more fre- 
quently than the lows, having heard record- 
ings of their own voices (Highs: Yes 13, 
No 7; Lows: Yes 4, No 13; chi square: 
4.803, significant at the .025 level). B. J. 
Todd (1946), Ortho Pharameutical Corpora- 
tion, in discussing a training program for his 
salesmen, reported that during the initial 
meeting of the course each salesman was 
given a voice recording; he then offered his 
opinion that “if you have never had the 
experience of listening to a reliable recording 
of your own voice, I can recommend it as 
a powerful means of motivation.” It could 
be that the better sales representatives in this 
study were more highly motivated by hearing 
recordings of their voices, or were more 
highly motivated to obtain recordings of their 
voices. 


CONCLUSIONS AND IMPLICATIONS 


Results of this study suggest the following 
conclusions : 


1. The more effective sales representatives 


were rated higher than the less effective ones 
both in terms of overall impression of 
communication skill and total or summated 
scores from ratings on separate skills. This 
suggests that oral communication skill—at 
least as rated by a trained interviewer—is 
likely to be a reliable criterion for differenti- 
ating superior from inferior sales people (of 
the type studied: part-time, nonprofessional, 
specialty). Of course, there is no reason to 
believe that this type of evaluation would be 
ineffective with other kinds of sales people 
who perform analogous work. This suggestion 
applies, however, only to basic oral com- 
munication skill (including listening); 
whether skills involved in “public speaking” 
would be equally pertinent to the selection or 
evaluation of face-to-face sales people re- 
mains undetermined. It is entirely possible, 


R. WAYNE PACE 


however, that another evaluator who chose 
to consider “glib” verbal fluency as a cri- 
terion could conceivably have rated a given 
respondent at the bottom of a scale on which 
the present investigator rated the same 
respondent at the top. 

2. Considerable success was achieved in 
differentiating more effective sales people 
from less effective ones by examining such 
sales methods as using emotional appeals 
and dramatizing. These techniques both 
deeply involve generalized communication 
behavior. The suggestion here is that a com- 
munication training program should con- 
stantly subordinate—without ignoring—the 
parts in favor of the whole. With the excep- 
tion of the rather broad area of use of 
language, ratings of sales people on separate 
skills—taken one at a time—failed to dis- 
tinguish between the more effective and the 
less effective sales people. These findings pro- 
vide additional support for the implication 
that generalized communication skill should 
be emphasized more than undue concern with 
fragmental, relatively isolated details of com- 
munication behavior (such as minutiae of 
gestures, vocal inflections, or grammar). This 
does not deny, of course, that any specific 
individual may find improvement in his over- 
all impression by improving a given narrow 
aspect of his communication behavior. The 
implication is clear, however, that sales 
people who are less effective in basic oral 
communication skill will also be less effective 
in utilizing the more specialized aspects of 
persuasive communication. 


REFERENCES 


Fundamental 
(3rd 


Gvuttrorp, J. P 
chology and education 
McGraw-Hill, 1956 

NIRENBERG, J. S. How to reach minds—and hearts 
when you talk to people. Sales Mgmt., 1958, 
St, 33. 

Topp, B. J. How—and why 
salesmen to improve their speech 


1946, 56, 129 
(Received May 15, 


statistics in psy 


ed.) New York 


training our 
Sales Mgmt., 


we're 


1961) 





Journal of Applied Psycholog 
1962, Vol. 46, No. 5, 325 


328 


VARIABILITY OF PERFORMANCE IN A VIGILANCE TASK 


T. W. FAULKNER? 


Northwestern University 


An experiment was conducted to determine the effect of signal pattern and 
frequency on the variability of S’s performance in a vigilance task. Ss were 
12 male college students who watched 3 dials during 3 consecutive 27-min. 
periods. Real signals occurred alone in 1 period while 2 different patterns of 
dummy signals were added in the other 2 periods. It was found that dummy 
signals which occurred at semiregular intervals were more effective in reducing 
S’s variability than those which occurred at nonregular intervals. It was also 
found that variability increased with time. It is concluded that use of a semi- 
regular pattern of dummy signals would be one way of improving performance 


on a vigilance task. 


It has been demonstrated repeatedly that a 
subject monitoring a single signal source in 
a vigilance test tends to become slower in his 
responses with the passage of time (Baker, 
1960; Jerison, 1957; Mackworth, 1950). This 
decrement in performance may appear either 
as an increase in the number of signals that 
the subject fails to detect or as an increase 
in the time required to respond to the signals. 
Previous results have indicated that an in- 
crease in the frequency of signals will alleviate 
this form of decrement and this finding has 


been confirmed by the present experiment. 
Previous experimenters have reported also 


does 
test 


that this decrement 
multisignal source 
Jerison, 1957). 

It is now proposed that performance does 
deteriorate on multisignal source tests in 
terms of the variability of the subject’s per- 
formance. The nature of this deterioration 
suggests that it might be prudent to define 
more carefully the criterion of performance in 
a vigilance task than has been done in the 
past. The purpose of attempts to improve the 
performance of a vigilance task not only 
should be to lower the mean response time or 
the frequency of signals missed, but also to 
prevent the occurrence of relatively long 
periods during which the subject will fail to 
detect a signal if one appears. An increase in 
the variability of the 


not appear in a 
(Broadbent, 1950; 


subject’s responses 


1 Now associated with the Eastman Kodak Com- 
pany in Rochester, New York 

The author wishes to acknowledge the many 
helpful suggestions made by G. K. Krulee during the 
course of the experiment described in this paper. 


means that there is an occasional very slow 
response to a signal and that the frequency 
of these slow responses will increase with time. 


METHOD 


In this experiment it was desired to test the 
effect of frequency of signals and pattern of signals 
on the level of performance in a multisignal source 
test. In order to accomplish this purpose each sub- 
ject was presented with three separate sets of 
signals. Each of these sets was 27 minutes in length. 
One set contained nine signals arranged in a purely 
random order. These signals were arbitrarily des- 
ignated as “real” signals and the set is 
as R. The two other 
this same set of nine 


referred to 

contained 
signals, but each of them had 
27 additional dummy signals superimposed on the 
sequence of real signals and these are referred to 
as D1 and D2. In the case of D1, the 27 dummy 
signals were arranged randomly with the restriction 
that the range of intersignal intervals for the dummy 
signals could only vary from 50 to 70 seconds. For 
D2, the times of occurrence for the dummy signals 
were selected randomly with the restriction that 
the range of intersignal intervals for the dummy 
signals varied from 10 to 110 seconds. The subsets 
of the nine real signals in D1 and D2 are referred 
to as RD1 and RD2, respectively. Thus, this experi- 
ment compares the effect of 
semiregular pattern of dummy 
dom pattern of real signals with the effect of 
superimposing a less regular pattern of dummy 
signals on the real signals. The three sets of signals 
were presented in sequence to each subject with a 
5-minute break between sets. All possible orders of 
the three sets were employed in a balanced experi- 
mental design. This amounts to using both possible 
3 X 3 Latin squares. 

The subject sat in a comfortable chair in a semi- 
darkened room facing a box containing three panel- 
type dc voltmeters and three push buttons. The 
viewing distance was approximately 24 inches. The 
signal appeared as a 5-volt deflection on the 30-volt 
scale of the voltmeter, subtending an arc of about 


groups of signals 


superimposing a 
signals on a ran- 





326 T. 


.9 degree at the eye. The pointer remained deflected 
until the subject pressed the push button 

The subjects were 12 male college students who 
volunteered to take the test. They were instructed 
to place their hands on the table which held the 
meter box so that their hands were about 
away from the push buttons. The subjects were 
told that the purpose of the experiment was to find 
out how well people could perform in the sort of 
work where they were required to detect 
that occurred only infrequently. They 
to press the push button beneath a 
quickly as they could after 
signal on that voltmeter. The necessity 
response was emphasized 


8 inches 


signals 
asked 


voltmeter as 


were 


observed a 
for a quick 


having 


RESULTS 


When the distribution of all 972 recorded 
response times was plotted, it became quite 
evident that the data did not fall into a nor- 
mal distribution. Since there was not a reason 
to believe that the universe from which the 
present sample was drawn is normally dis- 
tributed there can be no justification for the 
use of the usual parametric statistical tests. 
Nonparametric tests were used throughout 
the present experiment. 

The two most important aspects of per- 
formance explored in this experiment dealt 
with the variability of the subject’s responses. 
This variability was measured with respect 
to the signal frequency and with respect to 
the passage of time. The measure was of 
intrasubject variability. It proved to be pos- 
sible to use a recently developed nonpara- 
metric test for subject variability (Siegel & 
Tukey, 1960) to test the significance of the 
results obtained in this experiment. The 
findings are summarized in Table 1. 

Variability on the nine real signals was 
very significantly reduced when the semi- 
regular pattern of dummy signals was super- 
imposed on the real signals. Use of the less 
regular pattern of dummies led to a much 
less significant reduction in variability. In 

TABLE 1 
JECT 


VARIABILITY 


parisot Level of significance 
R > RDI 
R > RD2 


> Perio ) 


W. FAULKNER 


TABLE 2 


MEAN RESPONSE TIMES 


Level of 
significance 


Mean 


Comparison 


98 

= 1.0°; 
19 
83 
.82 


9 


fact, the D2 set of dummy signals can be 
said to have only questionable value in the 
reduction of variability. The second important 
comparison shows that variability does in- 
crease with time. That is, the subject’s 
responses were much more erratic during the 
third 27-minute period of the test than during 
the first 27-minute period. This comparison is 
on the basis of responses to all three sets of 
real signals. 

It was also determined that, on this multi- 
dial test, changes in the frequency of signals 
did affect the mean response time. The Fried- 
man two-way analysis of variance and the 
Wilcoxon matched-pairs signed-ranks tests 
were employed in the analysis of the data. 
The findings are summarized in Table 2. 

The response times on the nine real signals 
were significantly shorter when they were im- 


bedded in the two sets of dummy signals 


There was no significant difference between 
the use of the semiregular pattern and the 
less regular pattern of dummy signals insofar 
as the 


effect on time was 


concerned. Mean response time was reduced 


mean response 
in the D1 and D2 sets, not only on the nine 
real signals, but on the basis of all 36 signals 
as well. It is evident that the comparison 
made on the basis of the nine real signals 
alone is the most meaningful comparison, but 
some previous experimenters have made calcu- 
ations on the basis of all occurring signals. 
In other analyses it was found that there was 
no decrement in average response time from 
one period of the test to the next. Further- 
more, there was no significant change in 
average response time within any of the 27- 
minute periods. It was also established that 
the mean response time on any specific signal 





VARIABILITY OI 


was not dependent on the length of the inter- 

signal interval that preceded that signal. 
The data were characterized by the or 

casional occurrence of a very long response 


time. The average response time for all of 972 
signals presented during the course of the 
experiment was .83 second. Thirteen of these 
response times were in excess of 2 seconds in 
length and three of these were more than 5 
seconds in length. 


DISCUSSION AND CONCLUSIONS 


The present investigation has shown that 
variability of performance increases with the 
passage of multisignal 
vigilance test. The responses of the subjects 
became erratic after a prolonged spell of 
watching for infrequent signals. In practice 
this could prove to be a very serious form of 
performance deterioration. The existence of 
excessive variability would mean that the 
observer would occasionally allow a 


time in a source 


signal 
to remain for a relatively long time before he 
detected the signal. In many situations this 
would have consequences that could 
never be offset by later fast responses on the 
part of the observer. The object may be not 


grave 


only to keep the mean response time within 
certain limits, but also to prevent the occur- 
rence of response times that 
certain critical limit. The occasional occur- 
rence of very long response times had a major 
effect on the results that were obtained in this 
study. The elimination of the lapses in the 
subject’s ability to detect signals would be the 
most meaningful form of improvement pos- 
sible as far as performance on a vigilance task 
is concerned. This would reduce variability 
and would, at the same time, significantly 
reduce the mean response time. 

Both the D1 and D2 pattern of dummy 
signals produced a significant reduction in 
mean time. Neither set of signals 
was clearly superior in this respect. On the 
other hand, the semiregular pattern of signals 
produced a very significant reduction in sub- 
ject variability while the less regular D2 set 
was of questionable value. It follows that the 
use of a_ semiregular 


exceed a 


response 


pattern of dummy 
signals may be considered to be more effective 
than a set of the D2 variety as a means of 
increasing apparent signal frequency. The use 


PERFORMANCE IN 


? 


A VIGILANCE TASK 327 
of a semiregular pattern of dummy signals 
would have the desired effect of reducing both 
mean response time and subject variability 
in a multisignal source vigilance task 

The comparisons made in this experiment 
have been made on the basis of the three sets 
of real signals. Two of these sets were mixed 
with 27 additional signals which the subject 
could not distinguish from the signals 
It is important when discussing the use of 
dummy signals to emphasize the fact that 
comparisons were made on the basis of the 
real signals by themselves. 
to believe that the same results would neces- 
sarily be 


real 


There is no reason 


obtained if the comparisons were 
based on all signals, as has been the case in 
some past experiments. If the performance of 
a vigilance task does depend in any way on 
signal pattern then it would be dangerous to 
assume that performance on some set of signals 
would be the as performance on a 
specific subset of signals. In fact, the present 
experiment shows that signal pattern may 
have some effect. The D1 pattern of dummy 
signals produced a very significant reduction 
in subject variability while the D2 pattern 
had no really significant effect at all. Previous 
experiments, as well as the present one, have 
shown that performance on any specific signal 
does not depend on the length of the interval 
that preceded that signal (Deese, 1955; Mc- 
Cormack, 1958). This finding has been in- 
terpreted by some reporters to mean _ that 
signal pattern is without effect. This 
clusion is not completely justified 
analyses of this sort only examine specific 
signals with their preceding intervals while a 


Same 


con- 


since 


signal pattern is some definite arrangement 
of intervals within time. The only method for 
evaluating the importance of signal pattern 
as a parameter of performance is to compare 
the effect of sharply differing patterns on 
overall performance. The present experiment 
demonstrated that there are at least two pat- 
terns of signals which differ in this effect on 
subject signal 
pattern deserves to be more fully investigated 


variability. The question of 


before any final conclusions can be drawn. 
From the results of this experiment it is 
seen that variability of performance is a very 
important aspect of vigilance. It is necessary 
that an attempt be made to prevent the o 





328 


currence of occasional very long response 


times in order to reduce subject variability 
and mean response time. It has been proposed 
that one way of accomplishing this would be 
the addition of a semiregular pattern of 
dummy signals to the regular series of inputs. 


REFERENCES 
Baker, C. H. Maintaining the level of vigilance by 
means of artificial signals. J. appl. Psychol., 1960, 
45, 336-338. 
BROADBENT, D. E. The Twenty 
quiet conditions. Appl. Psychol 
No. 130. 


Dials Test under 
Unit Rep., 1950, 


T. W. FAULKNER 


Deese, J. Some problems in the theory of vigilance 
Psychol. Rev., 1955, 62, 359-368. 

Jertson, H. J., & Waits, R. A. One-clock and 
three-clock monitoring. USAF WADC tech. Rep., 
1957, No. 57-206. 

McCormack, P. D. Performance in a vigilance task 
as a function of inter-stimulus interval and inter- 
polated rest. Canad. J. Psychol., 1958, 12, 242-256. 

MackwortH, N. H. The measurement of human 
performance. Med. Res. Council spec. rep. Ser., 
1950, No. 268. 

Srecet, S., & Tukey, J. A non-parametric sum of 
ranks procedure for relative spread in unpaired 
samples. J. Amer. Statist. Ass., 1960, 55, 429. 


(Received July 28, 1961) 





Journal of Applied Psychology 
1962, Vol. 46, No. 5, 329-331 
RETEST CONSISTENCY AND THE WRITING OF LIFE 
HISTORY ITEMS: 
A FIRST STEP 
WILLIAM A. OWENS 
Purdue University 


J. R. GLENNON, anp LEWIS E. ALBRIGHT 


American Oil Company 


Retest consistency was employed as a criterion for the evaluation of life 
history items, and rules for item writing or selection such that this criterion 
may be better satisfied were suggested. The method employed required: an 
inspection of consistent and inconsistent items with a view to deriving rules, 
and the “blind” sorting of these items in accordance with a given rule by 
5 independent judges to determine the coincidence of rule conformity with 
retest consistency. 4 rules satisfied a statistical criterion of significant associa- 


tion. These deal 
undesirability of “forcing,” 
of a response for the respondent 


Toward the conclusion of World War II, 
the Personnel Research Section, Classification 
and Replacement Branch, The Adjutant 
General’s Office developed and recommended 
procedures for the identification and reten- 
tion of officers most valuable to the service. 


One of several devices found to have promis- 
ing validity was a so-called Biographical In- 


formation Blank, or scored life history. 
Subsequent to the war, interest in the utility 
of this versatile and practical measuring 
device seems to have been rekindled 
Ghiselli’s (1955) scopeful survey indicates 
that life history data have had substantial 
utility in the prediction of either a train- 
ability or a proficiency criterion for four of 
five major occupational classifications dealt 
with. In addition, such studies as those of 
Mosel and Cozan (1952) and Keating, 
Paterson, and Stone (1950) are in accord 
in indicating that the correlations between 
information provided by the applicant and 
that provided bv his employer range through 
the .90s for a selected list of deal- 
ing with employment. In spite of this 
encouraging evidence regarding the predictive 
validity of life history forms and the accuracy 
of the information provided on them, studies 
concerned with their reliabilities have often 
suggested that they leave something to be 
desired. Low estimates have undoubtedly 


items 


with brevity, with the definition of options, 


and with the pleasant or unpleasant implications 


with the 


obtained for at least two reasons. The 
first is that unless special attention is given 
to classifying them, life history items tend to 
be heterogeneous and to yield low split-test 
estimates for this reason (Siegel, 1956). The 
second is that some individual items are, by 
virtue of certain form and content features, 
less likely to receive consistent responses than 
are others. If a reasonably definitive analysis 
could be made of the differences between 
items having good and poor retest consist- 
encies, life history forms having more of the 
former and fewer of the latter would no 
doubt be more reliable and possibly more 
valid as well. Hopefully, the present investi- 
gation represents a step in this direction. 


beer 


PROBLEM 


Overall, it was the purpose of this study 
to educe rules for the writing and evaluation 
of life history items which would be favor- 
able to the production or selection of those 
possessing high potential retest reliability or 
consistency. 


METHOD 


This problem was attacked by (a) examining dif 
ferences between consistent and inconsistent items, 
by (6b) attempting to formulate rules descriptive of 
these differences, and by (c) tentatively evaluating 
the rules so derived. 


3909 





? 


330 


W. A. Owens, J. R 


Consistent and inconsistent life history items were 
selected from a list of 200 which had 
administered to a total of 43 subjects 
research scientists and 18 
dents). An interval of 


been twice 
25 employed 
university evening stu 
approximately 2 months 
intervened between the administrations. Thirty-five 
of the 200 items were arbitrarily identified as 
highly consistent because there were 
changes in response to them 
Similarly, 37 items were identified as inconsistent 
on the basis that responses to them had 
changed from 11 to 17 between test 
retest. 

The investigators examined the lists of consistent 
and inconsistent items at length and _tentativel 
formulated some 12 rules or principles which they 
felt might differentiate the one list 
Cursory tabulations revealed that 
held real promise in the context of the present 
items and criterion. These presented to five 
graduate students in industrial psychology who took 
the items in pied order and rated each one suc 
cessively as conforming or not conforming to a 
given rule. 

The number of items conforming to, or violating, 
a particular rule, within both consistent and in 
consistent categories, summarized in a four 
fold table by pooling the judges’ ratings. The chi 
square test was then employed to determine whether 
or not violations were independent of category 


three or fewer 


trom test to retest 


been 


times and 


from the other 
only 4 of the 12 


were 


was 


RESULTS AND DISCUSSION 

It was discovered that all four of the rules 
in accordance with which the items were 
rated yielded chi square values significant 
above the 1% level. Thesé rules appear 
below: 

Rule 1. Brevity is desirable. For example, 
by actual line count, the average length of 
stem among the consistent items was ap- 
proximately two lines as contrasted with 24 
among the inconsistent. Comparably, there 
were 22 items of eight lines or more among 
the inconsistent and 10 such items among the 
consistent. 

Rule 2. Whenever possible, numbers should 
be used to graduate and to define options or 
alternatives. For example: 


P) 


How old are you now 


1. 20 to 29 

30 to 39 

to 49 

50 to 59 
60 or older 
[here were nine such items among those in 
the consistent category and only four among 

those of the inconsistent category. 


GLENNON, 


AND L. E. ALBRIGHT 

Rule 3. Either all response options or 
alternatives should be covered or an “escape” 
option should be provided. Item A below was 
judged as conforming to this rule and Item B 
as violating it. 


Item A: How old were you when you were (first) 


married ? 


1. Less 


) 


than 20 years old 
20 to 23 years old 


. 24 to 25 years old 


4. 26 to 30 years old 


Over 31 years old 

6. Am not married 
Item B.: During and/or high 
school days, in which 


participate the most? 


your grammar 
type of activity did 


Check one) 


you 


1. Sand lot games 

2. Boy Scouts, 4-H clubs, FFA, or 
3. Student 
4. Student paper, science clubs 


YMCA 


government, school politics 


studied most of the time and 


did not participate 


5. I worked or 


This rule, relating to the undesirability of 
“forcing,” is not inconsistent with the body 
of research done on the forced-choice method. 
Here the general finding seems to be that 
forcing improves validity but 
reliability. 

Rule 4. Items, particularly item stems, 
should a neutral or a pleasant con- 
notation for the respondent. Item A conforms 
to this rule whereas B does not. 


decreases 


carry 


Item A: In what part of the country did you live 
most of the time before you were 18? (Mark only 
one) 


1. The Northeast 
New Jersey) 

2. The South (including Texas and Oklahoma) 

3. The Middle West Rocky 
Mountain area) 

4. The Pacific Coast 

5. Outside the continental 


including Pennsylvania and 


including the 


United States 


Item B: 
you? 


Which of the following best describes 


1. Socially introverted—not 


2. A 


a joiner 


would rather than 


plunge into action 


dreamer speculate 


3. Unconventional—not much 


precedence 


influenced by 
with all labor 
saving devices and techniques 
detailed work 


+. Physically lazy—intrigued 


5. Dislikes routine o1 





RETEST CONSISTENCY OF LIFE HISTORY 


As in the case of 
pears against a 


Rule 3, this rule also ap- 
backdrop of relevant re- 


search. It is a common finding, when ques- 
tionnaires are repeated, that the respondent 
selects fewer options which tend to place him 
in an unfavorable light on a retest than on 
an initial testing. This systematic tendency 
to change responses to unflattering options 


would, of course, argue that poorer reliability 
should accompany lower hedonic tone. 
Although these four rules may be of some 
guidance in item writing, it is essential to 
pursue this entire matter much further, uti- 
lizing new items, new judges, and new cri- 
teria. It is possible, for example, that some 
rules derived were peculiar to this pool of 
items, and it is a virtual certainty that a 
number of others not have 
emerged from a larger and more varied as- 
sortment. Similarly, since the validity and 


derived would 


reliability concepts are not completely com- 
patible, it is vital to determine some of the 
rules which favor the maximizing of validity. 
If a replication of sorts were to be under- 
taken it would be interesting to examine cer- 
tain rules which could not be substantiated 
in the present study, but which had con- 
siderable rational appeal. Five of these 
follow : 
in what 


1. Items should not be unrealistic 


they require of memory. 


ITEMS 331 


2. A currently correct response should not 
be subject to too rapid short term evolution. 
For example, “Where did you spend your 
last vacation?” 

3. Choices from the con- 
tinuum are more consistent than those from 
more neutral positions. best” and 
“second options seem 


extremes of a 


“Second 
poorest” types of 
particularly unstable. 

4. Positively worded statements are pref- 
erable to negatively worded statements. For 
example, were you most often (rather than 
least often) regarded during your youth as 
sensitive, overconfident, etc. 

5. Qualitative gradations of response sub- 
ject to differential perception should be 
avoided. A response continuum such as 
“seldom, occasionally, frequently, very fre- 
quently” is subject to such differential inter- 
pretation. 

REFERENCES 
GuiseLii, E. E. The average 
of tests. Berkeley: Univer 
KEATING, ELIZABETH, Paterson, D. G., & Stone, 

C. H. Validity of work histories obtained by 

interview. J. appl. Psychol., 1950, 34, 1-5 
Moser, J. N., & Cozan, L. W. The accuracy of 

application blank work histories. J. appl. Psychol., 

1952, 36, 265-369 
Siecet, L. A biographical inventory for students 

I. Construction and standardization of the 


ment. J. appl. Psychol., 1956, 40, 5-10 


utility of various type 
California Press, 1955 


instru 


(Received August 3, 1961) 





Journal of Applied Psychology 
1962, Vol. 46, No. 5, 332-336 


EVALUATION OF INPUT DEVICES FOR A DATA 
SETTING TASK 


FRANK J. MINOR ano STANLEY L. REVESMAN 
International Business Machines Corporation, General Products Division, 


Development Laboratory, Endicott, New York 


This study evaluates a set of numeric manual entry devices. The task required 
was to set numeric data into the devices. The devices evaluated in this study 
were: a 10-key keyboard, levers, a matrix keyboard, and rotary knobs. The 
criteria by which the devices were evaluated were: error rate, entry time, and 
operator preferences. A repeated measurements design was utilized. Each of 
24 subjects made 175 10-digit entries into each of the 4 devices. The conclusions 
based upon the data are: (a) the 10-key keyboard yields a significantly lower 
error rate and is significantly preferred compared to the other devices. This de- 
vice required significantly less time per entry compared to the level and rotary 
knobs. (b) The matrix device required significantly less time per entry and was 
significantly preferred compared to the level and rotary knobs. The matrix 
device does not result in a significantly lower error rate than the latter 


devices. (c) There were no significant differences between the lever and 


rotary knob devices 


In present data processing systems, varied 
methods of maintaining the accuracy of data 
are implemented. Examples of such accuracy 
checks are manual checking of handwritten 
source documents and verification of punched 
cards prior to data entry into the system. 
With the introduction of “real time” data 
processing systems, however, conventional 
verification procedures, prior to data entry 
into the system, are too time consuming. 
One objective of the real time concept is to 
permit the individual who originates the data 
to enter this data directly into the system 
with a minimum of time loss between the 
steps of data origin and data entry. The 
primary job of the persons who enter this 
data is not data transmission. They may be 
production employees on a factory floor, 
warehouse stock clerks, weather observers, or 
various other personnel engaged in tasks 
which require some form of recording and 
collating information generated by their 
primary job. Real time data processing would 
require such line personnel to set data into 
a manually operated terminal input device 
and transmit the data directly into a com- 
puter. The data is transmitted into the 
computer without verification steps by other 
individuals. 

A well device 


designed terminal 


input 
should permit line personnel to enter data 
accurately and rapidly. Conversely, a poorly 


designed terminal unit could result in an in- 
efficient and costly real time data processing 
system due to erroneous data entries, in- 
creases in nonproductive time of the persons 
entering the data, and increases in non- 
productive computer time. The study which 
follows investigates the manual input devices 
to be utilized by factory production person- 
nel. The application in the present study is 
that of job reporting for inventory and 
scheduling control. 

Prior to experimentation, a series of case 
studies were conducted at customer installa- 
tions where production workers might enter 
job information into terminal input devices 
located in the working area. The following 
types of information were collected from the 
case studies: level of education, age, and 
sex of the production personnel; methods of 
data origin by the production employees; 
type, length, and format of the data to be 
entered into the input devices; daily fre- 
quency of data transmission by production 
employee; estimated queueing conditions at 
the terminal input devices; and environ- 
mental conditions under which the input 
devices would be operated. All this informa- 
tion was necessary to identify independent 
variables which would have to be controlled 
or systematically varied for valid laboratory 
experimentation. 

This experiment was undertaken to com- 





INPUT DEVICES FOR A DATA SETTING TASK 


paratively evaluate a selected set of numeric 
input devices applicable to a data gathering 
input station. The choice of the devices 
evaluated in this study was based on two 
requirements: the devices were to be selected 
from those currently available and economi- 
cally feasible for real time data processing 
in terms of engineering criteria, and the 
devices had to be amenable to the task of 
production job reporting by production per- 
sonnel unskilled in the use of touch systems 
for clerical-type keyboards. 

As a function of the field studies the 
experiment utilized a defined population and 
a defined method of data origin and data 
length. The performance criteria of percent 
of entries in error and the time required to 
index numeric information into the devices 
were analyzed to answer the following ques- 
tion: what are the differences between the 
terminal input devices in terms of error rate, 
time per entry, and operator preferences? 
What are the differences in the learning 
curves of the devices where learning is de- 
fined as the rate of reduction of errors and 
time per entry as a function of practice? 


METHOD 


Independent Variables (Devices) 


Numeric input devices were studied exclusively 
in this study. With the exception of the 10-key 
keyboard, the devices selected were limited to “on 
shelf” devices which have self-buffering 
istics in the keyboard or the display. A device which 
has self-buffering in the display during data entry 
has advantages over an independent buffering sys- 
tem in terms of cost considerations. The devices 
evaluated in this first study each accommodated a 
10-digit numeric word and were as follows: a 
10-key keyboard with a visual display (the visual 
display for the data word serves as the buffer), 
levers with a visual display, a 10 matrix key- 
board, and rotary knobs with an exposed moving 
scale and a fixed marker line in the 12 
position. 

Drawings of the four presented in 
Figure 1. Each device was provided with a 
“transmit” key which the operator depressed when 
he wished to transmit his manually-indexed entry. 
There were two signal lights on each device, one 
being a green “ready” light which indicated to the 
operator that he could initiate a new entry, and 
the second light being a red “in process” light which 
glowed immediately after the operator had depressed 
his transmit key. The in process signal indicated to 
the subject that he was not to initiate a new entry 
since the present entry was in the process of being 


character 


o'clock 


devices are 


A= 10-key key- 
matrix keyboard, and 


Fic. 1. Input devices studied: 
board, B=lever device, C = 
D =rotary knob device. 


transmitted. The maximum delay from the time of 
depression of the transmit key to the restoration of 
the ready status was approximately 5 seconds. 

Data which the operator was required to “set” 
into a terminal input device was not transmitted 
until he depressed the transmit key. Therefore cor- 
rections were possible any time prior to the depres 
sion of this key. No correction could be made in an 
entry on any input device after the operator had 
depressed the transmit key. The operator could 
check the accuracy of his 10-digit entry prior to 
transmission on the 10-key device and also on the 
lever device by means of a visual display on each 
of these two devices. The matrix device entry was 
checked by the operator noting the depressed key 
in each of the 10 columns. The rotory knob device 
entry was checked by noting the digit dialed be- 
neath the indicator line. On the 10-key device, a 
“clear” key, when depressed by the operator, cleared 
the entire 10-digit buffer storage and display. On 
the remaining three experimental input devices, 
errors detected by the operator prior to transmission 
were corrected by making appropriate changes in 
the individual row settings. 


“ 


Dependent Variables 


The dependent variables by which the experi 
mental input devices were evaluated were: time per 
entry, percent of entries which contained one or 
more operator caused undetected errors, and oper- 
ator preferences. Time per entry was defined as 
the time elapsing between the operator’s indexing 
of the first digit of his data word until the oper- 
ator’s depression of the transmit key. The time was 
measured in .1 second. Error was defined as the 
percent of entries transmitted by the operator which 
contained one or more operator caused undetected 
indexing errors. Operator preferences were measured 
in terms of a subjective ranking by each operator. 





334 FRANK J. MINOR AND 


Upon completing his performance on 
devices, subject ranked the four devices on 
the the to which he judged the 
devices facilitated accurate entries with a minimum 


of delay and difficulty 


all input 
each 
degree 


basis of 


Apparatus and Stimuli 


Data collection in the experiment was automated 
by an electromechanical system. The 
system was designed to automatically record and 
identify each 10-digit entry made by each subject 
on each device and the time required by the subject 
to make the entry. The data-logging system func- 
tioned with from one to four of the experimental 
input devices operating simultaneously. 

Each experimental input device 
located in partitioned adjacent cubicles. 
Devices were mounted that the center of each 
device was approximately 45 inches from the floor 
line. The exact height and angle of each device 
varied slightly to position each device into its most 
suitable operating position. The experimental task 
was performed by the subjects in a 
position. 

Each data entry consisted of a 10-digit numeric 
data word. The 10-digit word was generated by the 
subject from information printed 
source card used in conjunction with a printed 
reference manual. The resulting 10-digit entries 
generated by the subject were predetermined digits 
drawn from a table of random numbers. Seven 
hundred master stimulus source cards were prepared. 
The 700 cards were randomly divided into four 
decks of 175 each. It is assumed that the four decks 
were each of an equal level of difficulty. For each 
subject participating in the experiment, a complete 
set of each of the four stimulus decks was prepared 


data logging 


was physically 
separate 


so 


standing 


on a_ stimulus 


Experimental Design and Procedure 

A repeated measurement experimental design was 
implemented so that each of the 24 subjects per- 
formed once on each the four input devices 
Since there were four devices, there were 24 possible 
sequences in which subjects could with 
the devices. Each of the subjects assigned to 
one the 24 possible sequences that each 
sequence occurred only once. 

Assignment of the four stimulus decks was such 
that a subject performed with a different deck at 
each of the four devices with which he performed 
This obviated an improvement in performance as a 
result of familiarity with the stimulus data words 
The four stimulus decks were assigned to the 
jects that the four decks occurred 
equal number of times on all four devices 

The sample of subjects for the experiment was 
selected to be representative in terms of general 
ability level of the expected population of operators 
of the terminal input devices in a “real time” data 
gathering system to be utilized on a production 
floor for inventory and scheduling control. The sub- 
jects were 24 male 
drawn from a 


of 


peritorm 
Vas 


of so 


sub- 


so each of 


an 


production employees randomly 


variety of machine and assembly 


STANLEY L. REVESMAN 


Business Machines 
The the subjects ranged 
between 23 years and years with an average 
of 39 years. The level of education of the sample 
ranged between grades 8-12. None of the subjects 
had previous experience as operators of 
similar to the experimental devices. 


Task 

To simulate realistic job characteristics, the sub- 
jects were required to generate their own “data 
word” to be entered into the terminal input device 
The method of data origin required that the subject 
perform a subtraction to determine his data word 
for each source card. The subject wrote with pencil 
the data word quantity on the appropriate source 
card. The quantity that the subject had written on 
the card was the data entry the subject was to 
make into the experimental input device. The sub- 
handwritten data word, whether the sub- 
traction was correct or not, served as the criterion 
against which the manual entry checked for 
accuracy. 

The subject was instructed to process three 
stimulus cards as described, and then to enter them 
one at a time into the input device. The subject 
was instructed to depress the transmit key as soon 
as he was satisfied with the accuracy of the entry 
he had indexed. 


groups in the International 


Endicott plant. age ol 


56 


devices 


ject’s 


was 


Training and Performance of the Subjects 


Subjects were individually instructed by the ex- 
perimenters in the performance of the required task 
and the operation of the experimental input device. 
Following three demonstration entries by the 
perimenter, the subject required to process 
three practice stimulus and perform the 
manual entries. Upon completing their performance 
on an assigned experimental input device, the sub- 
jects were rotated to the respective devices next 
assigned to them. The same procedure of demon- 
stration and practice preceded the subject’s per- 
formance on each of the input devices. All subjects 
completed their assigned task of 175 
deck) with each input device 


ex- 
was 
cards 


entries (one 


RESULTS 


Analysis of the Error Rate 


Bartlett’s test of homogeneity of variance 
(Edwards, 1957) precluded an assumption 
of homogeneous error data. A nonparametric 


test was therefore conducted to determine 
whether there was any overall difference in 
error rate between the four devices. 
man’s two-way analysis of variance test 
(Siegel, 1956) indicated that there was a 
statistically significant overall difference in 
error rate between the four devices (p < .01). 
The error rate for each device is presented 


Fried- 





INPUT DEVICES FOR 


TABLE 1 


PERCENT OF ENTRIES 
ONE OR Mor! 


CONTAINING 
ERRORS 


Median “%% of 


Device erroneous entries 


10-key device 6 
Lever device 2.3 
Matrix device 1.2 


Rotary knob device | 


in Table 1. Since the error data were skewed, 
the rate of error is presented in the form of 
median error rates. 

Individual tests between pairs of devices 
were performed using Friedman’s two-way 
analysis of variance (Siegel, 1956). The tests 
indicated that the 10-key device resulted in 
a significantly lower error rate than the lever, 
matrix, or rotary knob devices (p< .01, 
p< .05, p< .01, respectively). There were 
no other significant differences in error rate 
between the devices. It was also found that 
the error rate was randomly dispersed over 
trials for the four devices. 


Analysis of Time per Entry 

For each subject the time per entry for 
each successive 25 trials on each device was 
pooled. Because of occasional damaged cards 
from the data logging system, the last set of 
pooled trials for some subjects contained less 
than 25 trials. Therefore the last 25, or less, 
trials were pooled with the previous set of 
25 successive trials. The resulting pooling 
thereby yielded six successive averages for 
each subject on each device. The mean entry 


ER DEVICE 


x-—— ROTARY KNOB DEVICE 
MATRIX DEVICE 
10 KEY DEVICE 


1-2) 
I-2 


26-50 5I-75 76400 IOFi25 26-175 
TRIALS 
Fic. 2. Average time per entry for each of the 
four devices. (Averages are based on successive sets 
of pooled trials.) 


A 


DaTA SETTING TASK 


time for each of these six points for each 
device is graphically presented in Figure 2. 

For the time per entry data, an analysis of 
variance was employed to test for the dif- 
ferences between devices, between trials, and 
the interaction between devices and trials. 
The analysis of variance yielded the follow- 
ing results: there was a statistically signifi- 
cant overall difference between the devices in 
terms of average time per entry (p < .001), 
there was a statistically significant overall 
difference between the six sets of pooled 
trials (p < .001), and there was a significant 
interaction effect between the devices and the 
trials indicating a difference in performance 
curve profiles between the devices (p < .01). 
A summary of the results of the analysis of 
variance is presented in Table 2. 

The Scheffé method for multiple compari- 
sons (Scheffé, 1959) was utilized to compare 
average entry time of each device with every 
other device. The results of the test indicated 
the following: the 10-key device and the 
matrix device required significantly less time 
per entry than the lever and rotary knob 
devices (p < .01), there was no significant 
difference in average entry time between the 
lever device and the rotary knob device, and 
there was no significant difference in average 
entry time between the 10-key device and 
the matrix device. 

The significant interaction between devices 
and trials indicated that the profiles of the 
average time for entry performance curves 


TABLE 2 


ANALYSIS OF VARIANCE OF TIME PER ENTRY 


Source of variation 


Between devices (D 3 48.986.16 78.98*** 
Between pooled trials 
(T) 

Between subjects (S 23 
Interaction S X D 69 
Interaction S X T 115 
Interaction T KX D 15 
Interaction S X D X T 345 
Within cells 15,946 


Total 


$525.40 


2 53.18*** 
3,785.40 


90.52*** 
9.11**" 
1.25 
3.00** 


1.63" 


620.19 
85.09 
204.37 
68.07 


41.82 


16,521 





336 


for the four devices varied significantly. 
A series of Scheffé multiple comparisons 
(Scheffé, 1959) were employed to explain 
the nature of profile differences. In examining 
the lever device, it was found that a signifi- 
cant reduction in average entry time occurred 
among the following sets of pooled trials: 
the first set with all other sets; the second 
set with Sets 4, 5, and 6; and the third set 
with Sets 5 and 6 (p < .05). For the 10-key 
device significant differences occurred among 
the first set with Sets 3, 4, and 5; and the 
second set with Sets 5 and 6 (p < .05). For 
the rotary knob device and matrix keyboard, 
significant differences occurred only among 
the first set with all other sets (p < .05). 


Analysis of the Device Preference Ranking 
Data 


The rank order preference data was sta- 
tistically analyzed by Friedman’s two-way 
analysis of variance (Siegel, 1956). The test 
indicated a significant overall difference in 
preference between the devices (p < .001). 
Individual tests between pairs of devices em- 
ploying a special form of Friedman’s two- 
way analysis of variance (Siegel, 1956) indi- 
cated as follows: the 10-key device was sig- 
nificantly preferred to all three other devices 
(p < .001); the matrix device, which was 
the second most preferred device, was sig- 
nificantly preferred to either the lever device 
or the rotary knob device (p<.01); and 
there was no significant difference in pref- 
erence between the rotary knob device and 
the lever device. 


DISCUSSION 


The results of this study have practical 
significance. Of the four devices evaluated, 
the 10-key device best satisfies the criteria 
of operator performance and preference for 
the application investigated. 

Error rate is the most critical of the three 
criteria considered since data entries into 
terminal devices do not undergo conventional 
verification procedures prior to reaching the 
computer. The 10-key device resulted in sig- 
nificantly less erroneous entries than the 
matrix, the rotary knob, and lever devices. 

The number of job reportings which would 
occur at a terminal device on a real produc- 


FRANK J. MINOR AND STANLEY L. REVESMAN 


tion floor would usually have peak load 
periods at specific times during the work 
shift. Time required to perform a job report- 
ing therefore should be minimized to avoid 
queueing and reduce the amount of non- 
productive time employees stand in line when 
queues do occur. The 10-key device did not 
demonstrate a significant entry time advan- 
tage over the matrix but did require signifi- 
cantly less time per entry than the lever and 
rotary knob devices. On the basis of the 
case study information collected, this time 
saving would be of practical significance in 
most installations. 

As was stated in the Results section, a 
significant interaction was found between 
trials and devices in regard to entry time 
which was in part accounted for by con- 
tinued improvement with the lever device. 
It is hypothesized that the continued im- 
provement in reducing the time per entry 
with the lever device resulted from the sub- 
ject operators becoming more proficient in 
“dampening” the lever motion as they ap- 
proached the desired digit. In operating the 
lever device, there was a tendency to over- 
shoot the digit to be dialed which thereby 
required a correcting adjustment to insure 
proper detent of a lever. Improved dampen- 
ing action would tend to reduce the frequency 
and amount of corrective movement required. 
Learning, however, did not appear to be a 
practical criterion in this study. 

The 10-key device will be studied in fur- 
ther detail with reference to the utility of a 
display. There were indications that the 
display was more frequently used to deter- 
mine if the correct number of digits had 
been entered rather than to check the entry 
digit by digit for accuracy. For certain ap- 
plications a less costly display indicating 
whether or not the data fields have been 
filled may be used with no loss in accuracy 
and a cost saving in device construction. 


REFERENCES 

Epwarps, A. L. Experimental design in psychologi- 
cal research. New York: Rinehart, 1957. Pp. 195- 
198. 

Scuerré, H. The analysi: 
Wiley, 1959. Pp. 78-110 

Srecet, S. Nonparametric statistics. 
McGraw-Hill, 1956. Pp. 166-172. 


(Received August 18, 1961) 


of variance. New York: 


New York: 





1 
i 


Journal of Applied Psychology 
162, Vol. 46, No. 5, 337-34 


, 337-34 


NONRANDOM TENDENCIES IN INTERPOLATING 
BETWEEN END-POINTS ' 


RICHARD C. SORENSON 


anp ARNOLD L. TOWE 


Depariment of Physiology and Biophysics, University of Washington School of Medicine 


6 Ss’ estimates of the position of a line within a 2-4 


2 mm. interval were 


analyzed and found to be nonrandomly distributed among the decimals of the 
interval, even though the original data were uniformly distributed. Each S was 
highly consistent in his pattern of decimal selection over time. The patterns 
for different Ss were remarkably similar; decimals near the end-points of the 
interval were chosen at the expense of those in the interior of the interval. The 
position of the end-points was largely responsible for the nonrandom selection 
observed, for the numbers “preferred” could be altered by altering the end- 
points. The S’s knowing of this phenomenon had some effect on the magnitude 
of the tendency, but did not eliminate it. 


Often, in converting experimental data from 
analogue to digital form, the observer records 
his scale readings, not to the nearest vernier 
mark, but to the nearest visually estimated 
decimal of the vernier interval. It has been 
suggested that at least two factors play a 
role in the selection of the estimated decimal, 
viz., the objective stimulus and the “habits 
and facilities people have acquired in the use 
of numbers” (McCormick, 1957, p. 97). Yule 
(1927) found that the distribution of the last, 
or estimated, decimal in such numerical data 
as census reports, grain weighings, thermom- 
eter readings, and anthropological measure- 
ments deviated significantly from a uniform, 
or random, distribution. Since “zero” occurred 
more than one-tenth of the time, he called 
it a “preferred digit.” Four, six, and seven, 
which occurred less than one-tenth of the 
time, he called “‘avoided digits.” Since Yule’s 
work, others have made similar observations 
with respect to both numbers and letters 
(Horton & Mecherikoff, 1960; Smith, 1949). 
This paper will report the patterns of decimal 
estimations obtained by visual estimation of 
the position of a vertical line within a small 
interval; the data were a by-product of 
measurements from film strips of the latency 
of discharge of single neurons in the central 
nervous system (Towe & Amassian, 1958; 
Towe & Kennedy, 1961). Evidence will be 
presented that an additional factor, the posi- 


1 Supported by a grant (B 396) from the National 


and 
and 


Blindness, 


Welfare, 


Institute of Neurological Diseases 
Department of Health, Education, 
United States Public Health Service. 


tion of the end-points of the interval of 
measurement, strongly influences the pattern 
of decimal selection. 


METHOD 


Neuron discharge records were obtained by con- 
ventional microelectrode techniques; the electrical 
responses were photographically recorded from in- 
dividual sweeps of an oscilloscope face. The 35- 
millimeter film record was then displayed on a TDC 
project-or-view enlarger, which increased the image 
about six times. A 1-millisecond time scale, traced 
onto vellum, was placed over the enlarged record so 
that the zero mark on the time scale corresponded 
to the position of the stimulus artifact on the 
record. Each 1-millisecond interval was between 2 
and 4 millimeters long. The beginning of each 
spike discharge appeared on the record as an almost 
vertical line dropping or rising from the base line; 
at the sweep speeds normally used, the thickness 
of the line corresponded to .1-.2 millisecond on the 
time scale (see records in Towe & Amassian, 1958; 
Towe & Kennedy, 1961). The position of the leading 
edge of this line was estimated visually to the 
nearest .1 of the 1-millisecond vernier divisions; the 
vernier line, about .02 millisecond in thickness, was 
centered on the leading edge. Of about 300,000 such 
measurements taken at the neurophysiology labora- 
tories of the Department of Physiology and Bio- 
physics at the University of Washington over the 
past 6 years 20,046 constituted our samples. 

Thirty-three samples of the latency measurements 
made by six investigators were analyzed. In three of 
these samples, two or more persons had measured 
the same original data. It was assumed that the 
estimated values for the last, or .1 millisecond, 
digit should be randomly distributed over the 10 
decimals, because the original data appeared to be 
randomly distributed around a_ particular 
latency. This latter conclusion derived 
detailed observations of neuron response 
made with a_ crystal-controlled 


mean 

from 
latencies 
impulse-interval 


was 





Six SUBJECTS 


SELECTION 


=< 
b-| 


OF 


PERCENTAGI 


AVERAGE 


RICHARD 


SORENSON 


Average rank 


AND ARNOLD L. Tower 


timer with a digital read-out that was correct to 
“20 microseconds (Towe & Amassian, 1958). It 
will be clear from the results of this. study that 
the assumption was justified 


RESULTS 


Initially, the frequency of occurrence of 
each of the 10 decimals was tested against the 
assumed random, or uniform, distribution of 
occurrence by the chi square test, with 9 
degrees of freedom. Each of the 24 samples 
tested in this way differed from the expected 
distribution at the .01 level or greater. In 
each case, decimals near the end-points of 
the interval were selected more often than 
expected, while decimals in the interior of the 
interval were selected less often than ex- 
pected. These tendencies were as strong when 
the vernier interval was 4 millimeters wide as 
when it was only 2 millimeters. 

Individual consistency. The pattern of 
decimal frequencies was highly consistent for 
each individual over time. Three of five in- 
vestigators never significantly “avoided” deci- 
mals that were initially significantly “pre- 
ferred.” On the other hand, two did change 
“preference” for one decimal during the 
course of their work; the decimal involved 
was at the middle of the interval. Spearman 
rank correlation coefficients of the decimal 
frequencies for successive samples from the 
same individuals varied from .44 in one person 
to .95 in another; the latter correlation was 
between measurements made 2 years apart 
The average Spearman rank correlation coeffi- 
cients for different individuals ranged from 
.52 to .82. 

One investigator became aware of the pre- 
ponderance of zeros in his latency measure- 
ments and attempted to correct for the effect. 
Prior to this time, the percentage of zeros in 
his measurements had ranged from 20 to 23 
Subsequently, 11 to 15% of his measurements 
were zeros. Even though the awareness had 
brought an effort to correct for the excess, the 
deviation of his subsequent measurements 
from the assumed random decimal frequency 
distribution remained significant beyond the 
.O1 level. 

Interindividual consistency. The pattern 
of decimal selection was strikingly similar 
from one person to the next. Table 1 shows 





NONRANDOM TENDENCIES IN 


the average percentage of total measurements 
that each decimal was selected by the six 
investigators. The Kendall coefficient of con- 
cordance (W), which measures the extent of 
association among several rankings 
(see Siegel, 1957), was employed to test the 
degree of consistency of decimal selection 
among the individuals. The value of W so 
obtained, .64, was significant beyond the .001 
level. (A value of .64 for W is about equiva- 
lent to an average rank correlation of .56.) 
The average frequency of decimal selection 
by all six individuals is shown in Figure 1; 
the expected frequency for each of the deci- 
mals was .l. It is evident that zero, the 
decimal occupied by the end-points of each 
interval, was the most frequently selected 
decimal. As indicated previously, decimals on 
either side of the vernier end-point were also 
selected more often than expected, while 
decimals in the middle of the interval suffered 
a corresponding decrease in selection below 
the expected frequency. The decimal occupy- 
ing the middle of the interval was selected 
least often by only two individuals while an- 
other selected this decimal more often than 
expected. 


sets of 


Sequence of decimal selection. It was pos- 
sible to test the extent to which the previously 
selected decimal affected the next measure- 
ment. Pairs of adjacent measurements were 
studied for individuals. The cell fre- 
quencies differed from that expected (E 
(3; 3;)/T) beyond the .01 level for both. One 
investigator tended to avoid selecting the same 
decimal on two successive measurements; the 
other showed a highly significant tendency to 
select a preferred decimal after a preferred 


two 


Fic. 1 Average percentage of 
decimal selection by six subjects 


frequencies of 
(The percentage at 
O has been arbitrarily plotted on both ends of the 


interval.) 


INTERPOLATION 








Fic. 2. Frequency distribution of decimal selection 
by three subjects, with end-points at O and at 
(The end-point percentages have been arbitrarily 
plotted on both ends. Solid line end-point at 55; 
dotted line end-point at .0.) 


decimal and an avoided decimal after an 
avoided decimal. These were only short- 
range tendencies within each sample, however. 

Effects of changing the end-points of the 
interval. To determine whether or not a 
simple “number preference” was involved in 
this phenomenon, the position of the end- 
points of the interval was shifted by .5 milli- 
second. Three subjects measured the 
data first in the normal manner, with a 
vernier scaled as 0, 1.0, 2.0, 3.0 milli- 
seconds, and then with a vernier scaled as 
eS ae Bes . milliseconds. The results, 
shown in Figure 2, clearly indicate that a 
simple number preference hypothesis was in- 
adequate. It is evident that the position of the 
end-points of the interval played a major 
role in the pattern of decimal selection; the 
decimal five, the end-point of the interval in 
the second type of measurement, replaced 
zero as the highly preferred decimal. The 
change in decimal selection occurred im- 
mediately when the scale was changed by .5 
millisecond. The probability that the original 
data, taken over so many years on so many 
different animals, could clump in this cycle is 
so low that it may be taken as zero. 


same 


One individual measured a large amount 
of data with a vernier scaled as 0, .5, 1.0, 1.5, 


2.0, 2.5 milliseconds. Not only did the 
decimal which had avoided 
digit by the standard 1-millisecond scale, 
become a highly preferred decimal, but the 
frequency of increased 
greatly. The successive samples over time 
when this scale was used were more highly 


five, been an 


selection of zero 





340 RICHARD C. SORENSON 
correlated than when the standard 10-decimal 
interval was employed. 


DISCUSSION 


The estimation of the position of a line 
within a 2-4 millimeter interval to the nearest 
one-tenth of that interval is subject to various 
systematic influences. From the present data 
it is concluded that the most important of these 
influences is the position of the end-points of 
the interval; the expected uniform frequency 
of selected decimals was not found, but rather 
the frequency of decimal selection increased 
near the end-points, at the expense of the 
midinterval decimals. That the particular 
numbers involved were not important was 
shown by the effects of shifting the end-points 
to a new position over the numbers—the 
pattern of frequency of decimal selection 
moved over the numbers with the end-points, 
but did not change in any other way. For a 
few individuals, the middle of the interval 
evidently plays a slight role in decimal selec- 
tion; one subject selected the central decimal 
more often than expectegl by the hypothesis 
of a uniform decimal frdquency distribution. 


Although some individual. number preference 
was probably involved, the overbearing ob- 
servation was the similarity of the pattern 
of number selection among individuals. Fur- 


ther, when the original interval was sub- 
divided into two equal parts, and the data 
measured to the nearest one-fifth of the sub- 
interval, the same pattern of number selec- 
tion was found within the subinterval: deci- 
mals at the end-points of the subintervals 
were most often selected. It can thus be con- 
cluded that the end-points of the interval of 
visual estimation are a preponderant influence 
in the frequency with which the decimals 
within the interval are selected. 

It was a little surprising that the subjects’ 
knowledge of this influence had such a minor 


AND ARNOLD L. Tower 


effect on its magnitude. Intentional “correc- 
tion” for the influence decreased, but did not 
erase, it. This finding is consistent with the 
presence of the same effect in an individual’s 
measurements made several years apart. This 
factor can probably be regarded as a perceptual 
phenomenon that must be reckoned with in 
the visual conversion of analogue data into dig- 
ital form. When large numbers of observations 
are involved, it probably has little influence 
on the computed mean of the data, but has 
its greatest effect on the size of the standard 
deviation computed from the data. Towe and 
Amassian (1958) found this to be true when 
the same data were measured both visually 
and with an automatic timer. The mean 
values calculated by the two methods usually 
fell within 50 microseconds of each other, 
whereas the standard deviations differed by 
three to five times (the visually converted 
data yielding the larger deviation). When the 
variation in the original data is much larger 
than the vernier interval employed in con- 
verting, this difference tends to disappear. 


REFERENCES 

Horton, D. L., & Mecuertkorr, M 
ences: Ranking the alphabet. J 
1960, 44, 252-253. 

McCormick, E. J. Human engineering 
McGraw-Hill, 1957. 

Srecet, S. Nonparametric 
McGraw-Hill, 1956. 

SmitH, M. H., Jr. Spread of effect is the spurious 
result of nonrandom response tendencies. J. exp. 
Psychol., 1949, 39, 355-368. 

Towe, A. L., & Amassran, V. E. Patterns of activity 
in single cortical units following stimulation of 
the digits in monkeys. J. Neurophysiol. 1958, 
21, 292-311 

Tower, A. L., & Kennepy, THerma T. Response of 
cortical neurons to variation of stimulus intensity 
and locus. Exp. Neurol., 1961, 3, 570-587. 

Yuteg, G. U. On reading a scale. J. Roy. Statist. Soc., 
1927, 90, 


Letter prefer- 


appl. Psychol., 
New York: 
York: 


statistics. New 


570-587. 


(Received September 18, 1961) 





Journal of Applied Psychology 
1962, Vol. 46, No. 5, 341-343 


VIGILANCE PERFORMANCE AS A FUNCTION 
OF PAIRED MONITORING* 


BRUCE 0. BERGUM anp DONALD J. LEHR 


United States Army Air Defense Human Research Unit, Fort Bliss, Texas 


2 experiments were performed to determine the effect of pairing of Os upon 
individual monitoring performances. Both studies employed 2 groups of 20 
Ss each. Group 1 consisted of paired monitors and Group 2 consisted of 
isolated monitors. Experiment I employed a rate of 24 signals/hr.; Experiment 
II employed a rate of 6 signals/hr. All Ss monitored a circular light display 
for a period of 90 min. Neither experiment indicated an overall facilitation of 
performance resulting from pairing, but both demonstrated significant relation- 
ships between performances of the members of the pairs. It was hypothesized 
that the degree of conversational interaction between members of the pairs 


might account for the observed effect. 


A few years ago Deese (1955) suggested an 
alternative to the theoretical conceptualiza- 
tions of “vigilance” performance that were 
then most current. The position he took was 
that of emphasizing the role of background 
sensory input in the maintenance of efficient 
monitoring behavior, a position that has since 
come to be known as the “activation” in- 
terpretation of vigilance. This interpretation 
has found support both among other students 
in the field and in the nature of the ac- 
cumulating body of evidence from research in 
vigilance. 

One implication of this theory that has 
received little investigation is the facilitation 
of individual monitoring performance to be 
anticipated from the presence of other moni- 
toring personnel in the immediate environ- 
ment. Thus, if performance level is a function 
of background sensory input, then it follows 
that insofar as the presence of other monitors 
constitutes an additional background sensory 
input, performance should be facilitated. 

The data on this point are not decisive. 
Shafer (1949) studied detection performance 
on an audio-visual sonar display and reported 
that two individuals working together pro- 
duced from 11 to 20% more detections than 
an individual working alone, and that three 


1The research reported in this study was per- 


formed for the Human Resources 
of George Washington University under contract 
to the Department of the Army. The opinions ex 
pressed in this report are solely those of the authors 
and do not necessarily reflect those of the sponsoring 
gency. 


Research Office 


individuals produced 6 to 15% more detec- 
tions than two individuals. It should be noted, 
however, that these results refer to the com- 
bined performances of the monitors and not to 
improvements in the performances of the 
individual monitors. 

Frazer (1953), on the other hand, reported 
improvement in the performances of individ- 
ual monitors when the experimenter remained 
in the test room during a prolonged vigil, but 
in this case it is probable that the facilitation 
resulted as much from the authority figure 
presented by the experimenter as from any 
other source. 

The present studies were designed to deter- 
mine the effect of the presence of a second 
monitor in the same monitoring situation 
upon individual monitoring performance. It 
was predicted that the presence of a second 
monitor would significantly improve individ- 
ual detection performance over that of moni- 
tors working in isolation. 


EXPERIMENT I 


This experiment tested the effects of paired 
monitoring on performance of a task employ- 
ing a relatively high rate of signal presentation. 


Vethod 


Subjects. A total of 40 National Guard trainees 
from the Army Training Center, Fort Bliss, served 
as the subjects in this experiment. 

Apparatus. Three isolation booths were employed 
in this study. Each booth was equipped with a 
circular panel, 13 inches in diameter, consisting of 
0'%4-inch red lamps which illuminated in sequence 
at a rate of 12 rpm. A signal consisted cf the failure 





Bruce O. BERGUM 


TABLE 1 
PERCENTAGE OF CORRECT DETECTIONS FOR 
PAIRED AND ISOLATED MONITORS 
aT HiGH SIGNAL Ratt 


Time period 


Condition Average 
Isolated é 7 7 81 
Paired 87 86 


Average 


of a lamp to illuminate in its normal sequence. The 
displays were mounted vertically at seated eye height 
on the rear walls of the booths and a small table 
was located directly beneath the display and ad- 
jacent to the wall. Room illumination was by a 
shaded 40-watt frosted lamp mounted 

behind the subject. 

A preset program consisting of 24 signals per 
hour (Mackworth, 1950) caused a signal to be 
generated on the display and responses were made 
by depressing a hand-held push button. Both signals 
and responses were automatically recorded on paper- 
tape recorders located in a 
external to the booths. The control area and three 
booths connected by a 
munication network. 

Two response push buttons, each feeding to a 
separate recording channel, were located in one of 
the booths, and one push button was located in 
each of the two remaining booths. 

Conditions. All subjects worked continuously 
through three contiguous periods of approximately 30 
minutes each for a total of 90 minutes of work 
without rest. The 40 subjects were randomly as- 
signed to two groups of 20 subjects each. The 
control group consisted of individuals working in 
isolation; the experimental group consisted of pairs 
of individuals working independently in the same 
booth with freedom to converse anything 
but the occurrence of signals. Separate measures of 
the frequency of correct signal detections were taken 
on all subjects in both groups 


above and 


central control area 


were two-way intercom- 


about 


Results 


The mean percentage of correct detection 
scores for the two groups for each 30-minute 
period are presented in Table 1. Both groups 


demonstrated a decline in performance over 
time periods with the mean performance of 
the paired individuals tending to be slightly 
superior overall. In order to determine the 


these differences, a 
Krusal-Wallis one-way analysis of variance 
was performed on these data. This analysis 
yielded an H of 1.38 with an associated two- 


significance of group 


AND DONALD J. LEHR 


tailed probability of .20 > p> .10. On the 
basis of this analysis, pairing did not yield 
significantly superior performance. 

To determine the specificity of the pairing 
effect on performance, a rank order correla- 
tion was performed between the members of 
the 10 pairs of individuals working together. 
This analysis yielded a rho of .709, p < .05. 
As a control for this analysis, a second rho 
was computed between individuals tested at 
the same time but in separate booths. This 
analysis yielded a rho of .109, p> .05. 
Finally, the detection scores for the two 
paired individuals were combined and these 
scores paired with the combined scores of the 
two individuals in separate booths for a given 
session. This yielded 10 pairs of scores for 
which a final rho was computed. The purpose 
of this analysis was to determine whether 
common factors were affecting the perform- 
ances of all subjects in a given session. This 
analysis yielded a rho of —.082, p > .05. 

These analyses suggest that whatever the 
effect of pairing was, it tended to be specific 
to a given pair of individuals, i.e., when one 
member of the pair did well, the other mem- 
bed did well, and these results cannot be 
accounted for by test situation artifacts. 


EXPERIMENT IT 


This experiment tested the effects of paired 
monitoring on a task employing a relatively 
low rate of signal presentation. 


Me thod 


Subjec t \ 
from the 


total of 40 National Guard trainees 
Training Center, Fort Bliss, served 
as the subjects in this experiment. 


Army 


{pparatus. The apparatus for this experiment was 
identical to that employed in Experiment I 
The conditions for this experiment 
were identical to those in Experiment I with the 
exception that in this study the signal presentation 
rate was six signals per hour. 


Condition 


Results 


The mean percentage of correct detection 
scores for the two groups for each 30-minute 
period are presented in Table 2. Both groups 
showed a decrement in performance over time 
periods, but neither group demonstrated a 
marked superiority over the other. A com- 
parison of the groups, employing a Kruskal- 





VIGILANCE 


TABLE 2 


PERCENTAGE OF CORRECT DETECTIONS FOR 
PAIRED AND ISOLATED MONITORS 
AT Low SIGNAL RatTrE 


Time period 


Condition Average 


Isolated 
Paired 


Average 


Wallis one-way analysis of variance, yielded 
an H of 0.145, p> .05. The groups were 
indistinguishable in terms of overall detection 
performance. 

In order to determine whether the pairing 
variable resulted in a significant interaction 
between individuals working together despite 
the failure to obtain an overall effect, a rho 
was performed between the members of the 
10 pairs of individuals working together. This 
analysis yielded a rho of .773, p< .01. As 
a control, a second rho was computed between 
individuals tested at the same time but in 
separate booths. This analysis yielded a rho 
of —.152, p > .05. A rho between the com- 


bined scores for paired individuals and the 
combined scores for the individuals in sepa- 


rate booths for a given test session was also 
computed. The value of this rho was .115, 
p> .05. These results are very similar to 
those found in Experiment I and tend to 
further confirm the conclusion that the effects 
of pairing are specific to the pairs involved. 


DISCUSSION 


The results of the two studies combined 
failed to support the hypothesized overall 
facilitation of individual performance pre- 
dicted by the activation interpretation of 
vigilance behavior. The results from Experi- 
ment I tended in the right direction, but 
the results from Experiment ITI lent no sup- 
port at all to such an interpretation. 

In contrast, however, the significant rela- 
tionships demonstrated between the perform- 
ances of the paired individuals indicate that 
pairing does affect individual performance in 
some unspecified way. 


AS A FUNCTION 


OF PAIRED MONITORING 


The analyses indicate that whatever is 
happening, it is not an artifact of the ap- 
paratus or procedures and, because the sub- 
jects knew they were being monitored over 
the intercommunication system, it is improb- 
able that the effect was the result of cheating 
between members of the pairs. The failure 
to find a significant overall effect is further 
evidence in support of this argument. If signal 
information were being exchanged, the effect 
would be to raise the overall detection per- 
formance for the group. 

A possible interpretation of the results that 
would still be in line with an activation 
interpretation might be that the effects of 
environmental stimulation are conditional 
upon the degree of stimulation. Thus, for ex- 
ample, when stimulation was relatively mild 
its effect might be facilitative. Beyond some 
point, however, irrelevant stimulation (in- 
tense conversation, for example) might be 
distracting and interfere with performance. 

While such information was not recorded 
nor correlated with performance in the present 
studies, it was apparent in monitoring the 
conversations of the paired subjects that 
considerable variation occurred among the 
pairs in terms both of the amount and in- 
tensity of conversation. 
in the 
operant conversational levels of individuals, 
if the amount of conversational interaction 
between members of the pairs is in fact 
related to the combined performance of the 
pairs, such “personality” differences might 
prove to be a fruitful avenue for future re- 
search in the area of vigilant behavior. 


Since considerable differences exist 


REFERENCES 
Derse, J. Some problems in the theory of vigilance 
Psychol. Rev., 1955, 62, 359-368 
Fraser, D. C. The relation of an environmental 
variable to performance in a_ prolonged 
task. Quart. J. exp. Psychol., 1953, 5, 31-32. 
Mackwortu, N. H. Researches on the measurement 
of human performance. Med. Res. Council spec 
rep. Ser., 1950, No. 268 
Suarer, T. H. Detection of a signal by several ob- 
USN Electron. Lab. res. Rep. 1949, 


visual 


servers 
No. 101 


(Received Septembe: 





Journal of Applied Psychology 
1962, Vol. 46, No. 5, 344-349 


INTERVIEWER CONSISTENCY IN THE USE OF EMPATHIC 
MODELS IN PERSONNEL SELECTION * 


DANIEL SYDIAHA 


University of Saskatchewan 


Personnel selection interviewers predicted the responses of applicants to 2 
paper and pencil tests. Comparisons were made between these predictions 
and the responses of applicants (accuracy score), predictions and the re- 
sponses of interviewers (assumed similarity score), and the responses of 
applicants and interviewers (similarity score). (These scores were referred 
to collectively as empathy scores.) Statistical analysis of empathy scores and 
their components indicated marked inter-interviewer inconsistency, which was 
interpreted to mean that interviewers tend to make errors by resorting to 
empathy as a basis of decision making. Explicit, actuarial bases of decision 
making in interviewing are advocated. Results of a follow-up study supported 


this interpretation. 


While it is generally acknowledged that the 
interview is an unreliable technique of per- 
sonnel selection, the sources of error associ- 
ated with the interview are open to question. 
This paper is the third of a series of investiga- 
tions (see Sydiaha, 1959, 1961), all of which 
explore possible mechanisms of decision mak- 
ing in interviewing. The aim of the work is 
essentially descriptive in that it attempts to 
account statistically for decision variance, but 
it is also evaluative in that it attempts to 
analyze decision making process into aspects 
which are consistently applied by a number of 
interviewers from those aspects which are not. 
The frame of reference adopted is essentially 
that described by Hammond (1955) and 
Ferguson (1951) who advocate an extension 
of the usual functional analysis in problems 


1 Financed by Defence Research Board of Canada, 
Grant No. 9435-53 to E. C. Webster, McGill Uni- 
versity. Webster directed the research and Areta 
Crowell assisted with computations. Some of the 
computations were performed by a Royal McBee 
LGP-30 electronic computer at the University of 
Saskatchewan. 

The author gratefully acknowledges the assistance 
of members of the Canadian Army Personnel Selec- 
tion Service. 

All tables cited in this paper have been deposited 
with the American Documentation Institute. Order 
Document No. 7258 from ADI Auxiliary Publica- 
tions Project, Photoduplication Service, Library of 
Congress; Washington 25, D. C., remitting in advance 
$1.25 for microfilm or $1.25 for photocopies. Make 
checks payable to: Chief, Photoduplication Service, 
Library of Congress. 


of prediction to include the clinician as well as 
the test as part of the measuring device. 

The term “model” in the title of this paper 
is intended to emphasize the fact that the 
decision making mechanism being examined 
here is hypothetical only, and does not cor- 
respond in any complete sense to a descrip- 
tion of the phenomenal experience of the 
interviewer. The arguments for discarding 
phenomenal or introspective evidence will not 
be considered in detail here, and it may be 
sufficient to mention that the approach stems 
from Brunswick’s view that perception is not 
a fully rational process (Postman & Tolman, 
1959). Also Sarbin, Taft, and Bailey (1960) 
assume that in clinical inference many cues 
are inaccessible to examination. In principle, 
the procedure adopted is to have the ex- 
perimental subject (interviewer) perform a 
task at the time he makes a decision. The 
task performance represents an objective, 
operational measure of the decision making 
model, the adequacy of which is determined 
by correlating performance obtained with the 
decisions made. The selection of tasks fol- 
lows Brunswick’s principle of “representative 
design,” i.e., tasks approximate situations 
which are relevant to the interviewer’s field 
of experience. The result is thus a compromise 
between subjective and objective methods: the 
task is suggested by the phenomenal experi- 
ence of the interviewer, although the task 
as given to the interviewer is explicit and 
operationally defined in terms of a test, thus 


344 





INTERVIEWER CONSISTENCY 


meeting the requirements of scientific method. 
Admittedly such a model can never be said to 
be a “real” decision making mechanism. How- 
ever, if the model adequately accounts for 
decision variance and suggests a more con- 
sistent and valid basis of selection, then such 
apparent or face validity is obviously only 
a minor consideration. 

Selection of “empathy” as a decision mak- 
ing model stemmed from the observation 
that interviewers tend to “place themselves 
in the shoes” of applicants to “try to under- 
stand the applicant’s motives in applying for 
a job.” For interviewers with a strong “clin- 
ical” orientation, such tendencies may be 
developed explicitly as a matter of course, 
but even in the absence of such an orienta- 
tion, the tendency may still exist. Particularly 
if documentary evidence is either inadequate 
or conflicting, the interviewer may resort to 
empathy in the sense of predicting the ap- 
plicant’s feelings, attitudes, and the like. 

Three sets of test responses provide the 
basic information, namely, the responses of a 
subject (applicant, designated O for Other), 
the predictions of a judge (interviewer, de- 
signated J) in which J predicts the responses 
of O, and the responses of J to the test, i.e., 
his self-description. From these three sets of 
responses, three sets of difference scores are 
derived which define the three empathy meas- 
ures under investigation Cronbach, 
1955): 

1. The Accuracy score involves a com- 
parison between the responses of J and 
the prediction of J. X,; designates the re- 
sponses (X) of person (0) to item (i) and 
Y,,; the prediction (VY) of judge (j) as to 
how he thinks o would respond to 7. Then for 
a set of & items and N others, the overall 
Accuracy score will be: 


(see 


a SS ( T7 
kN , 
2. Assumed Similarity involves a com- 
parison between the predictions of J and 
his self-description. If X,; designates the 
response of J to i, then for a set of K items 
and N others, the overall Assumed Similarity 
score will be: 


345 


3. Similarity involves a comparison be- 
tween the responses of O and J: 


1 _ 
—— >> (X 01 


iy ee Aer 


(It should be noted that the word empathy 
is used to refer to all three scores defined 
above. Although the word is sometimes used 
in a more limited sense to refer specifically 
to the Accuracy score, it is used in the more 
general sense here in the absence of any 
known term which includes all _ three 
measures. ) 

Quite apart from their relevance to deci- 
sion making, empathy scores are of potential 
importance as sources of error in the form of 
interjudge inconsistency. If each of the scores 
is expanded algebraically, then a number of 
components can be identified and each com- 
ponent has distinct psychological meaning. 
The details of the argument are not re- 
peated here (see Cronbach, 1955); it is suf- 
ficient to note that 23 separate components 
can be identified, of which 10 occur in more 
than one score. These components occur as 
measures of response set (mean score), dis- 
persion (variance), and covariation (correla- 
tion coefficient). In other words, high empathy 
scores are associated with optimum con- 
ditions of response set, response variance, 
and covariance among the score components. 

The question which follows from this 
analysis is whether these components are 
equal in magnitude for a sample of inter- 
viewers. Cronbach (1955) has shown that 
such differences do, in fact, exist. The pur- 
pose of this study was to examine whether 
such differences were important in accounting 
for individual differences in decision making. 


PROCEDURE 
Subjects. Details of the samples obtained are 
described in detail elsewhere (Sydiaha, 1959). Each 
of eight Canadian Army Regular Force personnel 
officers interviewed from 14 to 50 regular force 
applicants for the Canadian Army. Total N for the 
project was 256, but empathy data were missing 
for two cases. 

Empathy Data. Two test 
44-item questionnaire of 
(I-E), and a 140-item form of the Semantic Dif- 
ferential (SD). The I-E items are described by 
Freyd (1924) and also appear in Morgan (1956). 
This particular test was selected from many others 


were used: a 
Introversion-Extraversion 


forms 





346 


which were available on grounds that it differed 
somewhat in form and content from the MMPI, 
which is routinely administered to Army applicants. 
Since each personnel officer had the applicant’s 
MMPI answer sheet available during the interview, 
it was necessary to use a different questionnaire 
for this project to avoid artificially high Accuracy 
scores through direct knowledge of the applicant’s 
responses. The SD was made up of 10 scales, each 
applied to 14 concepts. (The scales were: cold-hot, 
valuable-worthless, red-green, small-large, fast-slow, 
dirty-clean, weak-strong, tasty-distasteful, deep- 
shallow, sharp-dull. The concepts were: my father, 
leadership, my work record, me, my mother, being 
a soldier, girl-friend, a steady job, being on my 
own, liquor, the way I look, friends, taking orders, 
worry.) Scales were selected to represent each of 
the three factors of potency, activity, and evaluation 
(see Osgood, Suci, & Tannenbaum, 1957) although 
intercorrelations computed from a sample of data 
obtained indicated that intrafactor scale correlations 
were no higher than were interfactor scale correla- 
tions. The 14 concepts were selected in consultation 
with Army personnel staff and were intended to 
include concepts having some relevance to Army 
adjustment. 

Upon termination of each interview, and as soon 
after the interview as possible, the interviewing 
officer was instructed to complete one copy of 
both tests according to his impressions of the way 
he believed the applicant would himself. 
Each applicant was also instructed to complete the 
two forms in terms of his own reactions. The inter- 
viewer had no knowledge of the applicant’s re- 
sponses until his decision about the case had been 
made. Upon completion of the project, ie., after all 
data had been collected in this way, each of the 
eight officers was then instructed to complete self- 
descriptions for the two tests. 

Correlation of Empathy Scores with Acceptance- 
Rejection. The design for correctional analysis was 
essentially that described in previous papers 
(Sydiaha, 1959, 1961). The cases were divided into 
three separate groups to permit cross-validation of 
findings. Cases provided by Officers A, B, F, and G 
(N 37, 50, 50, and 41) were randomly assigned 
to a criterion group (N = 88) and a holdout group 
N = 90). The remaining cases, ie., those provided 
by Officers C, D, E, and H (N = 14, 23, 22, and 18), 
made up a second holdout group (N = 78). 

As in previous work, the general statistical pro- 
cedure followed was to combine scores mathemati- 
cally in such a way as to maximize the correlation 
between acceptance-rejection for the 
criterion group, and the scoring key so derived was 
then applied to the holdout group to determine the 
relation between scores and acceptance-rejection 
This procedure was followed separately for the 
Accuracy, Assumed Similarity, and Similarity scores 

Also repeating the design of previous work, two 
sets of keys were developed in order to 
assess criterion variance specific to and common to 
officers. Thus “individual scoring keys” were 
on items analysis of each of the criterion groups 


describe 


scores and 


scoring 


based 


DANIEL SYDIAHA 


A, B, F, and G, while “group scoring keys” were 
based upon item analysis of the combined criterion 
group A+B+F+G. 

Analysis of Empathy Score Components. Analysis 
of components was confined to cases provided by 
Officers A, B, F, and G. (The analysis was found 
to be laborious and time consuming, even with the 
assistance of an electronic computer. In any case, 
the results with this partial sample were conclusive 
such that analysis of the remaining cases was not 
warranted.) Components which took the form of 
mean scores or differences between mean scores were 
tested by analysis of variance with one-way clas 
sification. Variance components were tested by 
Bartlett’s test of the homogeneity of variance and 
correlations were tested by the method described 
by Snedecor (1948). 

In testing for differences in 
interviewers, two questions were involved a) is 
the component equal in magnitude for all inter 
viewers? (b) If the component is equal across inter 
viewers, is there a difference in magnitude between 
“accept” and “reject” cases? In analysis of variance 
terminology the first question would involve test 
ing for significant interactions between interviewers 
and decision categories, and the question 
would involve testing for main effects or differences 
between decision categories. Consistent with 
reasoning, the appropriate test would involve pro 
cedures analagous to analysis of variance with two- 
way classification, with rows representing decision 
categories and columns representing interviewers 

While such a procedure was possible for testing 
the components involving mean scores, it was not 
possible for the variance or correlation components 
since there were no known tests available which 
were appropriate. Consequently, an indirect method 
of testing for interaction was adopted as follows 
from each of the four officers were divided 
into accept and reject samples, making eight samples 
in all. Differences among components tested 
separately for the four accept samples and the 
four reject samples. If differences among either 
accept or reject samples were found to be signifi- 
cant, then this was considered grounds for con- 
cluding that interactions were present in the data; 
ie., the magnitude of the given component was 
greater for some interviewers than for others, re- 
gardless of whether accept or reject samples were 
involved. If differences were not significant, then 
an overall test among the eight samples was calcu 
lated as a test of main effects; ie., of 
differences 


components among 


second 


such 


cases 


were 


de cision 
category 


RESULTS 


Correlations between Empathy Scores and 
Acceptance-Rejection 


Correlations (point-biserial) between em- 


pathy scores and acceptance-rejection showed 
no uniformities interviewers. The 
range of the 54 correlations calculated was 


across 





INTERVIEWER CONSISTENCY 347 


from —.45 to .84 (see Table A, 11 of the 
correlations were statistically different from 
zero). Considering the 12 sets of scores in- 
volved (3 empathy variables x 2 tests xX 2 
scoring keys), 5 produced intersample cor- 
relation differences which were statistically 
significant. The remaining seven also pro- 
duced marked sample differences which only 
failed to reach significance because of the 
small number of cases involved. This absence 
of consistency applied to all three measures 
and to both the I-E and SD data. 

The conclusion suggested by these results 
was that empathic processes, as a basis for 
decision making, tended to be highly specific 
to certain interviewers. Thus considering the 
I-E items, significant correlations were found 
for Accuracy scores for Officer B (.84 and 
.63),? Similarity scores for Officers E (.48) 
and G (.64 and .59), and Assumed Similar- 
ity scores for Officers B (.49 and .68), 
E (.53), and G (.62 and .49). In the case 
of SD items, a significant correlation was 
obtained for the Assumed Similarity scores 
for Officer G only (.68 and .05). This tend- 
ency for correlations to be specific to some 
interviewers was supported by the fact that 
there were more significant correlation dif- 
ferences among individual scoring keys 
(four) than among group scoring keys (one). 

It should also be pointed out that, in 
general, individual scoring keys did not pro- 
duce uniformly higher results compared with 
group scoring keys. Assuming that these two 
sets of keys reflected criterion variance which 
was specific to and common to interviewers, 
respectively, then the results implied the ab- 
sence of any appreciable amount of criterion 
variance common to interviewers. Criterion 
variance in the data tended to be confined 
to from one to three officers, depending upon 
the measure involved, such that the scoring 
keys remained relatively unaffected by the 
pooling of cases used in the group scoring 
procedures. 

The results also indicate another interest- 
ing source of error, namely, the tendency to 
attribute characteristics to Others which, in 


2In this paragraph, two correlation coefficients 
cited for specific officer samples refer to individual 
and group-scoring keys, in that order. Single coeffi- 
cients cited refer to group-scoring keys. 


fact, the Others failed to attribute to them- 
selves. Such unwarranted description can 
probably be thought of as a form of “projec- 
tion.” Evidence for this tendency was seen 
in discrepancies between Similarity and As- 
sumed Similarity scores which represented, 
respectively, self-descriptions made by O and 
descriptions attributed to O by J. Thus in 
two instances in Table A, correlations were 
significant for Assumed Similarity scores but 
not for Similarity scores (.49 versus .12 for 
Officer B using I-E data and .68 versus .33 
for Officer G using SD data). It would ap- 
pear in these two instances that although 
acceptance tended to be associated with the 
degree of similarity between applicant and 
interviewer, this similarity was more apparent 
than real. 

Such a discrepancy did not always occur, 
however, since in other instances the correla- 
tions were of the same order of magnitude: 
(for I-E data the results were for Officer E, 
.53 versus .48, and for Officer G, .49 
versus .59). 


Analysis of Components 


Detailed discussion of the results is omitted 
here to conserve space. A summary of the 
results is shown in Tables B, C, and D. A 
large proportion of the empathy score com- 
ponents showed statistically significant dif- 
ferences between samples: of the 54 separate 
components analyzed only 11 did not produce 
significant differences between samples of 
either reject or accept cases. Of the 43 dif- 
ferences which were significant, 33 were sig- 
nificant for both accept and reject samples, 
and 11 were significant for one or the other 
but not both. It would appear from these 
results that there were marked discrepan- 
cies among interviewers in the components 
making up the three empathy scores. 

Of the 11 components which gave no sig- 
nificant differences among accept cases or 
among reject cases, overall tests of signifi- 
cance were conducted among all samples to 
tést for accept-reject differences. No signifi- 
cant differences were obtained from these 
overall tests; it was concluded that these 11 
components were equal in magnitude for 
decision categories as well as for individual 
interviewers. 





348 


It was concluded, therefore, that there 
were no components which gave both compa- 
rable results for all interviewers and different 
results between decision categories. 

DISCUSSION 

The results of this study clearly suggest 
that empathic models reflect marked incon- 
sistency among interviewers. Empathy scores 
account for decision variance for some inter- 
viewers only, and none of the empathy score 
components showed decision-category dif- 
ferences in the absence of _ interviewer- 
category interaction. Some evidence was 
obtained of the tendency to “project” un- 
warranted characteristics to applicants and to 
distinguish between accept and reject cases 
on this basis. 

Such results imply that there is consider- 
able danger in resorting to empathy as a 
basis of decision making in selection. While 
there may be some apparent gain in addi- 
tional cues by doing so, this gain would 
appear to be offset by the fact that an em- 
pathic basis of decision making may be in- 
consistent from one interviewer to another. 
These results also argue for the practice of 
putting the decision on an explicit, actuarial 
basis, rather than leaving it to the “intui- 
tion” or “common sense” of the interviewer 
in which case the decision making cues are 
unspecified, unknown, or specific to the inter- 
viewer. To the extent that tendencies to 
empathize occur, then inter-interviewer in- 
consistencies are bound to result. Effective 
decision making in selection can be achieved 
only through the explicit delineation and 
combination of reliable and valid predictors. 


FoLttow-Up StTupy 


The validity of all variables described in 
this and preceding papers was determined by 
correlating scores with a performance cri- 
terion for all cases inducted into the Army.® 


A dichotomous re-enlistment criterion was 
used: soldiers were classified either as having 


3 The variables analyzed were: I. Assumed Sim- 
ilarity, SD data; II Assumed Similarity, I-E data; 
III. Similarity, SD data; IV. Similarity, I-E data; 
V. Accuracy, SD data; VI. Accuracy, I-E data; 
VII. Bales’ Applicant Index; VIII. Bales’ Interviewer 
Index; IX. Bales’ Interview Duration Index; X 
Statistical Score; XI. Clinical Score. 


DANIEL SYDIAHA 


performed sufficiently well to warrant re- 
enlistment, or as not having done so. This 
criterion was intended to reflect Army ad- 
ministrative practice; the classification was 
based upon existing Army documents and 
was performed by Army personnel with no 
prior knowledge of any of the scores being 
validated. There were 161 cases in all, 89 
classified as suitable for re-enlistment and 72 
as not suitable. 

Follow-up data were collected in June 
1960, approximately 3 years after induction, 
such that most subjects would have normally 
completed their commitment. Soldiers con- 
sidered suitable for re-enlistment included: 
(a) soldiers who had been re-enlisted; 
(6) soldiers released voluntarily but who 
were judged suitable for re-enlistment by two 
personnel officers at Army Headquarters, 
Ottawa; and (c) soldiers who had not com- 
pleted their tour of duty, but who were 
judged suitable by the Commanding Officer 
and the Personnel Officer of the units to 
which they were attached. Soldiers considered 
not suitable for re-enlistment included 14 
who were released on grounds of medical 
disability. 

Two sets of scores were validated: the 
first being the original scores based upon the 
induction criterion (acceptance-rejection) and 
the second set based upon the re-enlistment 
criterion. For the latter, item analyses were 
performed for all 11 variables on a sample 
of cases (NV = 57) and cross-validated on 
the remaining cases (VN = 104). 

The results appear in Table E. The valid- 
ity of all variables was uniformly low for 
all variables and for both sets of scores. The 
range of correlations (point-biserial) was 
from —.08 to .28. Only 4 of the 22 correla- 
tions reached statistical significance. These 
were: Assumed Similarity, SD data (r= 
.19); Bales Applicant Index (r= .28); 
Bales Interviewer Index (r = .21); and the 
Clinical score (r = .16). Assumed Similarity, 
SD data were based upon re-enlistment cri- 
terion, and the other three upon the inductive 
criterion. 

Validity coefficients were not uniform 
across officer samples. (Interofficer compari- 
sons were made only for scores based upon 
the induction criterion.) Interofficer sample 





INTERVIEWER 


differences in validity were statistically sig- 
nificant for 4 of the 11 variables and were 
quite marked for the other 7. The range of 
77 correlations calculated was —.39 to .72, 
of which 6 were statistically significant. It is 
important to understand, however, that this 
inconsistency among officers did not apply 
to their performance in screening successful 
soldiers: the proportion of applicants clas- 
sified as suitable for re-enlistment did not 
differ significantly among officers. In other 
words, the results obtained do not neces- 
sarily invalidate the interview for selection 
purposes. Rather, the lack of consistency 
among validity coefficients reflects sources of 
error in interview process which are relevant 
to the performance criterion. Although the 
variables analyzed in these studies do not 
provide a means of predicting Army per- 
formance on an institution-wide basis, they 
would appear to be useful in the training of 
interviewers, insofar as they reveal sources 
of inconsistency among interviewers. 

This follow-up study, therefore, confirms 
the results of the empathy study above in 
emphasizing the need for putting decision 
making in selection on an explicit, statistical 


basis, rather than on the basis of “apparent 
relevance.” 


CONSISTENCY 


REFERENCES 
affecting on 
and “assumed similar 


CronsBacH, L. J. Processes 
“understanding of others’ 
ity.” Psychol. Bull., 1955, 52, 177-193. 

Fercuson, G. A. Approaches to the experimental 
study of the Rorschach test. Canad. J. Psychol, 
1951, 5, 137-166. 

Freyp, M. Introverts and extroverts. 
1924, 31, 74-87. 

Hammonp, K. R. Probabilistic functioning and the 
clinical method. Psychol. Rev., 1955, 62, 255-262 

Morcan, C. ‘T. Student’s workbook to accompany 
introduction to psychology. New York: McGraw- 
Hill, 1956. 

Oscoop, C. E., Suci, G., & TANNENBAUM, P. W. 
Measurement of meaning. Urbana: Univer. Illinois 
Press, 1957. 

Postman, L., & Torman, E. C. Brunswick’s proba- 
bilistic functionalism. In S. Koch (Ed.), Psy- 
chology: A study of a science. Vol. 1. New York: 
McGraw-Hill, 1959. Pp. 502-564. 

Sarsin, T. R., Tart, R., & Bamey, D. E. Clinical 
inference and cognitive theory. New York: Holt, 
Rinehart, & Winston, 1960. 

Snepecor, G. W. Statistical methods. 
Ames: Iowa State Coll. Press, 1948. 
Syp1anA, D. On the equivalence of clinical and sta- 
tistical methods. J. appl. Psychol., 1959, 43, 395- 

401. 

SypranA, D. Bales’ interaction process analysis of 
personnel selection interviews. J. appl. Psychol., 
1961, 45, 393-401. 


scores 


Psychol. Rev., 


(4th ed.) 


(Received September 26, 1961) 





Journal of Applied Psychology 
ol. 46, No. 5, 35 35 


1962, \ 


PERSONALITY VARIABLES IN UNION-MANAGEMENT 
RELATIONS ' 


ROSS STAGNER 


Wayne State University 


To test the hypothesis that personalities of key figures have a significant 
impact upon the course of union-management relations at the plant level, 
quantitative data were gathered on the relationship in 33 tool-and-die shops 
Each manager and steward provided personality data (Guilford-Zim- 
merman) and judgments of best and poorest co-workers. A factor analysis 
of all possible intercorrelations showed a greater than chance expectancy of 
significant loadings of personality measures on the factors representing basic 
dimensions of the union-management relationship. Within limits set by such 


each 


factors as technology and union policy, 


union official 
interactions 


or the top 
management 


may 


No one today will question the assertion 
that outstanding personalities—e.g., John L. 
Lewis, Walter Reuther, Henry Ford, Sr., and 
have had significant impacts 
upon union-management relations in specific 
firms or industries. Observers are generally 


Sewell Avery 


agreed that the personalities of local leaders 
may affect the course of relations in a less 
conspicuous manner (cf. Harbison & Cole- 
man, 1951;McMurry, in Kornhauser, 1949; 
Selekman, 1947; Stagner, 1956). Demon- 
strating the nature and extent of this rela- 
tionship has, however, been a quite difficult 
task. 

Two methodological problems have arisen 
in this area of research. (a) Because union- 
management relations are so deeply influ- 
enced by wage level and wage payment plans, 
technology, type of work force, company and 
union organization, etc., it has been apparent 
that personality variables were relatively 
small in contribution to total variance. 
(b) Because quantitative methods for as- 
saying relevant dimensions of the union- 
management relationship were lacking, it was 
impossible to assign any confidence level to 
the differences observed. 

A beginning has been made on the second 
problem in the work published by Derber, 
Chalmers, and Stagner (1960). In that study, 
logical analysis of the relationship into quan- 

| This 
from the 
tions, the 
University 


grants 
Rela- 
State 


re search 
and Industrial 
Michigan-Wayne 


study was financed by 
Institute of Labor 
University of 


significantly 


the personality of the top manager 


modify the course of union- 


tifiable variables led to the construction of 
interview schedules from which scores could 
be derived for several attributes which ap- 
peared important. Factor analysis (Stagner, 
Chalmers, & Derber, 1959) identified under- 
lying dimensions which had respectable de- 
grees of functional unity. The present inves- 
tigation utilizes numerical estimates of cer- 
tain aspects of the relationship which were 
explored in the studies cited. In this way 
more precise comparisons become possible. 

The former problem has been by-passed, 
for purposes of the present research, by com- 
paring 33 establishments having a 
master union contract with similarity in 
wages, fringe benefits, technology, and labor 
force. The population is drawn from tool- 
and-die shops in the Detroit metropolitan 
area.* 


single 


VARIABLES MEASURED 


The interview schedule developed in the 
Derber et al. (1960) study provided for 
quantitative data on 35 aspects of the union- 
management relationship in a plant. Approxi- 
mately half of these related to contract 
negotiations, and hence could not be used in 
the present study (all members of the Em- 


My thanks are due to Chester A. Cahn and 
Ned Clarke of the Automotive Tooling Association 
for providing data on the union contract and for 
encouraging employers to participate in the study; 
and to Blaine Marrin and Russell Leach of Locals 
155 and 157, United Automobile Workers, for en- 
couraging cooperation on the part of union stewards 





PERSONALITY VARIABLES IN UNION-MANAGEMENT RELATIONS 


ployers Association are represented in a 
single bargaining committee). Some other 
items (e.g., skill level, and number of em 
ployees) would vary so slightly in this 
sample of establishments that they were 
dropped. The 14 variables retained were the 
following: depth of union influence (security, 
etc.), satisfaction with union influence, rela- 
tive wage level, satisfaction with wages, emo- 
tional tone, pressure (strikes and threats, 
lockouts, etc.), legalism in contract interpre- 
tation, reported mutual understanding, satis- 
faction with grievance procedure, consulta- 
tion (on other than grievances), 
adherence to past practice, conceding points 
to help the other side solve a problem, atti- 
tude of management to the union, and at- 
titude of union to the management. With the 
exception of the last two, all scores were the 
sum of reports given by the manager 
the steward. 


issues 


and 


For reasons of rapport it seemed unwise 
to use any projective tests. The Guilford- 
Zimmerman Temperament scale offers ob- 
jective scores on 10 personality variables 
which seemed reasonably likely to affect an 
interaction such as this (e.g., general 
vigor, emotional stability, dominance, etc.). 
It has been successfully with other 
industrial populations and therefore 
chosen as a representative personality in- 
ventory. Between the interview on establish- 
ment practices and the Guilford-Zimmerman, 


one 


used 
was 


respondents were shown a set of 18 pictures 
with varying captions and asked to choose 
a caption for each (a semiprojective test 
developed at Illinois). 

In addition to the foregoing, each respond- 
ent was asked to think of 
with whom he had worked and describe them 
by checking graphic rating scales. This pro- 
cedure is based on the work of Fiedler 
(1958), who has proposed that effectiveness 
of organizational related to 
the ‘“ASo” score of (tendency to 
make sharp discriminations between best and 
poorest co-workers, or to see them as very 
similar). On the hypothesis that the union- 
management relationship is really one organ- 
ization, we included six ASo scores based on 
the following ratings: ‘‘a member of the 
Union team with whom you can work best,” 


certain persons 


functioning is 
leaders 


‘a member of the Management team with 
whom you can work best,” “a member of the 
Union team with whom you can work least 
well,’ and “a member of the Management 
team with whom you can work least well.” 
The six ASo scores are thus: U-best-least, 
M-best-least, U-best-M-best, U-best-M-least, 
U-least-M-best, and U-least-M-least. These 
are identified in Tables 1 and 2 as U+U-, 
M+M-, U+M-4, etc 


HYPOTHESES 


Broadly speaking, the hypothesis posed in 
planning this investigation is as follows: the 
personalities of key figures have an impor- 
tant impact on the union-management rela- 
tionship at the local level. Problems of 


methodology and statistics, of course, pre- 


vent any clear-cut answer to a question put 
in such terms. 

Alternative approaches might be to set 
up specific atomistic predictions, e.g., that an 
aggressive union leader will be associated 
with high union influence in a given estab- 
ment, or that a thoughtful manager will be 
associated with a high rate of consultation 
between management and union. This ap- 
proach offers problems because of the very 
great number of possible hypotheses and the 
laborious course of testing required. 

Actually, one series of statistical tests was 
run on this basis. One of the establishment 
variables developed in the Illinois studies 
related to the “emotional tone” of the rela- 
tionship (relative harmony versus consider- 
able expression of antagonism). The 33 
establishments dichotomized on_ this 
variable and ¢ tests run for all personality 
measures on both the manager and the chief 
steward. It was obvious, however, that this 
approach left much to be desired. 

What we sought was an overall test of 
relationship between establishment variables 
ind personality variables. It was concluded 
that this could best be achieved by factor 
analysis. In the final interview schedule there 
were 14 establishment variables and 16 per- 
sonality variables. The null hypothesis was 
proposed that a factor solution would be 
reached which gave establishment vectors 
with no significant personality loadings and 
personality vectors with no significant estab- 


were 





352 


lishment loadings. While exact confidence 
levels cannot at present be determined for 
testing such a hypothesis, it was felt that 
inspection of the data would give a satis- 
factory answer and this expectation was 
confirmed by the outcome. 


PROCEDURE 


Employers were contacted by telephone and 
asked to permit an interview of 1 hour or a little 
more to obtain data on union-management relations 
in the tool-and-die industry. A letter to the same 
effect had already gone out from the Employers 
Association. Approximately 90% of the 43 firms 
contacted agreed to cooperate, although later failure 
to obtain personality data from one side of the 
other reduced the population to 33 firms. 

After obtaining answers to the interview sched- 
ule, the research assistant asked for permission to 
meet the union steward® and obtain parallel data 
from him. He also asked the manager to fill out 
a form using Fiedler’s “assumed similarity” format 
(Fiedler, 1957), to respond to a few pictures by 
choosing alternative descriptive captions, and to 
complete the Guilford-Zimmerman Temperament 
scale. (Most respondents were allowed to keep this 
and fill it out at leisure; it was picked up the next 
day, or as soon as it was available. Failure to get 
this item accounts for most of the incomplete data 
noted above.) 

The union stewards were generally interviewed 
(individually, of course) on company property. 
In a few cases it was apparent that this was not 
approved by the manager and appointments were 
made at the union hall or at the steward’s home. 
The interview schedules and personality data were 
identical for the manager and the unionist with the 
exception of a few items in the interview (e.g., one 
filled out a scale for “union attitude toward man- 
agement,” the other one on “management attitude 
toward the union.” 

Establishment scores were computed for 14 of 
the variables reported in the Derber-Chalmers- 
Stagner study. These are obtained (with the excep- 
tion of the attitude scales just cited) by averaging 
the responses of the manager and the steward. These 
scores were intercorrelated (Pearson r) with the 
assumed similarity scores and the Guilford-Zimmer- 
man scores, giving two matrices, one based on 
the manager’s “personality” data, the other on the 
steward’s data. Thus the first 14 variables shown in 
Tables 1 and 2 were identical in both matrices, 


8Since these shops were small, averaging about 
50 employees, there was generally no group of 
union officers. A chief steward, elected by the union 
members in the shop, handled day-to-day problems, 
and difficult grievances were settled with the presi- 
dent of one of the combined locals. My thanks are 
due to Kay H. Smith, who conducted the inter- 
views, and to Harvey Nussbaum, who prepared the 
data for the computer. 


Ross STAGNER 


the other 16 being different.4 These were analyzed 
by the principal axes method and rotated using the 
Varimax program (Kaiser, 1959).5 

RESULTS 

Our major interest will be in the extent to 
which dimensions of the relationship, as 
identified in the earlier study, recur in this 
set of data and in the loadings of the per- 
sonality variables on these dimensions. Be- 
fore turning to these, a few rough compari- 
sons may be presented. 

Of the 224 raw correlations between estab- 
lishment scores and personality scores for 
managers, 11 were greater than .33, which 
corresponds to the .05 level for N = 33. The 
same establishment scores yielded 20 cor- 
relations above .33 with union steward per- 
sonality scores. They are slightly above 
chance expectancy; however, they may not 
reflect relationships with basic dimensions 
(factors). 

Twelve factors were identified by the 
principal axes method. Of the 360 unrotated 
factor loadings, 89 exceeded .30 in the man- 
agement matrix and 97 were above .30 in 
the union matrix. The two factors with the 
smallest roots were dropped and 10 were 
rotated. After Varimax rotation, there were, 
respectively, 51 and 56 loadings above .30. 
It would appear that the two matrices were 
substantially similar. 

This similarity becomes less apparent when 
we turn to a consideration of the rotated 
factors and the loadings of specific variables 
on these factors. Table 1 shows the rotated 
factor structure for the managers in our 
sample. The following comments on each 
factor suggest possible interpretations. 


#The original correlation matrices and the un- 
rotated factor loadings have been deposited with the 
American Documentation Institute. Order Document 
No. 7259 from the ADI Auxiliary Publications Proj- 
ect, Photoduplication Service, Library of Congress; 
Washington 25, D. C., remitting in advance $1.25 
for microfilm or $1.25 for photocopies. Make checks 
payable to: Chief, Photoduplication Service, Library 
of Congress. 

5I wish especially to thank Kern Dickman and 
the staff of the University of Illinois Digital Com- 
puter Laboratory, Walter Hoffman and the Wayne 
State University Computing Center, and Charles 
Wrigley and the Michigan State University Com- 
putation Laboratory, for aid in these statistical 
analyses 





PERSONALITY VARIABLES IN UNION-MANAGEMENT RELATIONS 


TABLE 1 


LOADINGS OF 30 VARIABLES ON 10 FACTOR 


Variable I] III 


Establishment 


Depth of influence 
Satisfaction with influence 

W ages 

Satisfaction with wages 
Emotional tone 

Pressure 

Legalism 

Understanding 

Satisfaction with procedures 
Consultation 

Attitude 
Attitude 
Past practice 


Union to management 


Management to union 


Conceding 
Personality 


Di M+U4 

D2 M+M 

D3 M+lI 

D4 M-U 

D5 U-+t 

D6 M-U-— 
General activit 
Restraint 
Ascendance 
Social interest 
Emotional stability 
Objectivity 
Friendliness 
Thoughtfulness 
Personal relations 


Masculinity 


M-I. This appears to be a kind of “social 
distance” or personal perception dimension. 
Four of the assumed similarity scores load 


9 or better on this factor. The Guilford 
personal relations score has a .49 loading. 
Fiedler has interpreted his ASo scores as 
indicating interpersonal distance and the 
data seem to confirm this. The factor does 
not, however, relate to establishment scores. 
The only suggestive figure is a —.25 for 
attitude of management to union, and this 
may merely reflect the manager’s personality, 
since this is not a composite score. 

M-II. This seems to be the factor of union 
achievement reported earlier (Derber et al., 


AFTER VARIMAX ROTATION: 


MANAGEMENT DATA 


IV { VI VII VIII IX 


1960, p. 54, Factor 4). The largest loading 
(.83) is with reported wages, but satisfac- 
tion with union influence and satisfaction 
with wages have large loadings and attitude 
of union to management is also represented. 
The only interesting correlation with person- 
ality variables is a —.44 with managers’ 
thoughtfulness, and this is not easy to 
interpret. 

M-III. The heaviest loading here is with 
trait score for friendliness (more accurately 
pacifistic submissiveness). The managers 
high on this trait seem more satisfied with 
the state of union influence, have a positive 
attitude to the union, and report more 





Ross STAGNER 


TABLI 


10 Factors 


LOADINGS OF 30 VARIABLES ON 


Variable IT III 


Establishment 


Depth of influence 
Satisfaction with influence 
Wages 

Satisfaction with wages 
Emotional tone 

Pressure 

Legalism 

Understanding 

Satisfaction with procedures 
Consultation 

Attitude 
Attitude 
Past practice 


Union to management 

Management to union 
Conceding 

Personality 


D1 
D2 


M+U4 
M+M 
D3 M+U 
D4 M—U 
D5 U+U 
D6 M—U 


General activity 


ow eS 
ne 


Se 
n= = ww 


Restraint 
Ascendance 

Social interest 
Emotional stability 
Objectivity 
Friendliness 
Thoughtfulness 
Personal relations 
Masculinity 


These 


mutual conceding in the relationship. 
are plausible, given the Guilford-Zimmerman 
definition of the score. 


M-IV. This factor, with two assumed simi- 
larity scores having high loadings and reli- 
ance on past practice in the relationship also 
high, does not seem meaningful. 

M-V. This factor is closely related to the 
pressure aspect of labor relations: strikes and 
threats of strikes. Establishments with little 
pressure have favorable emotional tone, are 
satisfied with procedures, and report conces- 
sions by each side. (It resembles Factor 10 
in the Derber et al., 1960 study.) The only 
managerial personality trait represented is re- 


) 


AFTER VARIMAX ROTATION: UNION DATA 


IV VI VII VIII IX 


straint and the direction seems 
that one might have expected. 

M-VI. This is a personality factor which 
we may call social dominance or ascendance. 
Several of the Guilford scores have sizable 
loadings. The only establishment variable 
represented is “conceding” and this is in the 
expected direction (high dominance goes with 
less conceding). 

M-VII. This is another personality factor 
heavily loaded on masculinity and emotional 
stability. In this case substantial loadings on 
establishment scores occur, and generally in 
the expected pattern. Managers high 
masculinity are found in shops where con- 


opposite to 


on 





PERSONALITY VARIABLES IN UNION-MANAGEMENT RELATIONS 


sultation is rarely practiced, conceding is 
infrequent, and the union’s attitude is 
somewhat hostile. 

M-VIII. This factor has only a single 
heavy loading, for depth of union influence, 
which suggests that in our sample this vari- 
able may be relatively independent of the 
others. In the study of Derber et al., Factor 4 
also included other indices of union achieve- 
ment not covered in this investigation and 
some of those in Factor M-II of this study. 

M-IX. This is another establishment fac- 
tor, chiefly identified as emotional tone, but 
carrying appropriate loadings on wage satis- 
faction and attitudes of both union and 
management. This factor resembles Factor 10 
in the earlier study. Contrary to our expecta- 
tions, it did not involve any personality 
variables. 

M-X. This factor suggests a dimension 
of legalistic, “arms length” relationships with 
legalistic shops showing little evidence of 
mutual understanding or concession. It has 
some resemblance to Factor 9 in the earlier 
study. Managers in shops of this variety are 
below average on general activity; it seems 
plausible that an executive who was not very 
vigorous might adopt legalistic tactics as a 
protective mechanism. 

Let us now turn to the findings from the 
union data, which are summarized in Table 2. 
Some of these parallel the management data 
cited above, while others do not. 

U-I. Four of the steward’s assumed simi- 
larity scores load highly on this variable; 
attitude of management is represented, as is 
conceding behavior. There may be meaning 
here, but it is hard to locate. 

U-II. This is a friendliness factor compa- 
rable to M-III above. Unlike M-III it has 
no significant loadings for any establishment 
scores. This may suggest that submission on 
the steward’s part has no effect on the rela- 
tionship, whereas it does in the case of the 
manager. 

U-III. This seems to be the pressure factor 
(resembling M-V above and Factor 10 in the 
earlier study). Shops with frequent pressure 
tactics show less satisfaction with grievance 
procedures and with union influence, al- 
though, curiously, they report satisfaction 


with wages. Could the higher wages be a 


consequence of the pressure? Or is it rather 
that, since wage rates are fairly well limited 
by the contract, the dissatisfactions which 
lead to pressure are about grievance han- 
dling? Two assumed similarity scores pop 
up here and the steward’s general activity 
(vigor) score is highly related—but in a 
negative direction! This is quite contrary to 
expectation. 

U-IV. This is an ascendance factor re- 
sembling M-VI. Highly ascendant stewards 
are found in shops with poor emotional tone, 
which is to be expected. These shops also 
report more than average consultation, which 
may mean that the steward demands and 
obtains it. 

U-V. This looks like the union achieve- 
ment factor M-II and Factor 4 in the pre- 
vious study. It has moderate negative 
loadings for one ASo score and for thought- 
fulness of the steward. (Thoughtfulness in 
the Guilford-Zimmerman scale is a tendency 
toward introverted reflection; we may at least 
speculate that such stewards do not foster 
union achievement. ) 

U-VI. The highest loading here is for 
restraint and other Guilford scores. The only 
establishment score involved is for consulta- 
tion; stewards low on restraint are in shops 
low on consultation and this will surprise few 
observers. 

U-VII. This resembles M-VII, with the 
highest loading on masculinity. Stewards 
high on masculinity seem to be in shops high 
on understanding and on pressure; these ap- 
pear to be contradictory. Since this factor is 
presumably orthogonal to U-IV, we need to 
ponder what aspect of masculinity may be 
predominantly represented here. 

U-VIII. This has a very high loading on 
depth of union influence, as does M-VIII. 
In this case, however, we find some other 
significant loadings; one ASo score, and gen- 
eral activity of the steward. This appears 
meaningful, since steward’s level of vigor 
would be related to union influence, whereas 


one would not expect the manager’s person- 
ality traits to be reflected on this dimension. 


U-IX. This seems to be the emotional 
tone factor (M-IX; Derber et al., Factor 
10). As in the management data, favorable 
emotional tone goes with wage satisfaction 





356 


and little reliance on past practice. The only 
personality variable represented is the stew- 
ard’s objectivity or tendency to have feelings 
not easily hurt. (Good emotional tone is as- 
sociated with stewards not too sensitive.) 

U-X. As in M-X, this has an arms length 
flavor, high on legalism, low on mutual un- 
derstanding and conceding. There is some 
resemblance to Factor 9 in the earlier study. 
Curiously, the legalistic establishments seem 
to have friendly stewards; this may be a 
statistical accident, although it may indicate 
that nonbelligerent stewards prefer a legal- 
istic interpretation of the contract. 

In summary: on the mangement analysis 
five factors relate primarily to establishment 
variables and resemble some of the factors 
reported in the 1959 study. Three are clearly 
personality dimensions, and two are ambigu- 
ous. The union matrix similarly produced 
five factors relating to the establishment, 
again resembling the earlier data; four are 
personality dimensions and one is equivocal. 


DISCUSSION 


It seems entirely plausible that managers 
who are friendly in the Guilford-Zimmerman 
sense, i.e., anxious to avoid hostility, might 
report more satisfaction with union influence 
and more conceding to help out the other 
party in grievance settlements; similarly it 
is reasonable that ascendant, dominative 
stewards are found in shops with poor emo- 
tional tone, reports of hostility, etc. Such 
specific findings as these lend support to the 
somewhat vague hypothesis with which we 
started, viz., that personality traits of leaders 
affect the local plant labor relations situation. 

Against this point we must note that the 
overall test of significance of personality- 
establishment relations is not very encourag- 
ing. For an N of 33, the 5% confidence level 
requires an r of .33. In the 50 factors of the 
10 Guilford scores on the 5 establishment 
factors, only 3 exceed .33 in the management 
matrix, and only 4 in the union matrix. These 
barely exceed chance expectations. Similarly, 
in the raw correlations matrices. 9 exceed .33 
in the management data, 10 in the union data. 
Since there are 140 raw coefficients of Guil- 
ford scores with establishment these 
likewise are only a little above chance ex- 


scores, 


Ross STAGNER 


pectancy. This test is somewhat too severe, 
in that it is not plausible to expect all 10 G-Z 
scores to relate to union-management inter- 
actions, nor could we expect all 14 establish- 
ment scores to be affected by leader person- 
alities. It does suggest that the findings re- 
ported here lack firm statistical support. 
Further progress must be made by formulat- 
ing more precise hypotheses and testing them 
on new populations. 

Statistical support for the hypothesis that 
Fiedler’s ASo scores are relevant is even 
weaker. Among the raw data, 84 correlations 
relate ASo to establishment variables. For 
managers only 2 of 84 reached .33; for 
stewards, 10 of 84 met the 5% criterion. In 
the factor analysis, only 3 steward ASo scores 
had loadings above .33 on the 5 establish- 
ment factors; and only one manager ASo 
score had such a loading on the 5 establish- 
ment factors. It thus appears that the Guil- 
ford-Zimmerman scores produce slightly more 
than the expected number of significant cor- 
relations and loadings, but the Fiedler scores 
produce less than a chance number of sig- 
nificant interrelations for managers, slightly 
above chance for stewards. 

Of considerable interest is the fact that 
this analysis, while using only a portion of 
the interview schedule employed in the 
Derber-Chalmers-Stagner monograph, repro- 
duced with fair accuracy four of the establish- 
ment dimensions; and these were identifiable 
in both matrices. While the raw correlations 
of the establishment variables were identical 
in the analysis of Table 1 and Table 2, it is 
obvious that the matrix of intercorrelations 
with personality variables might have induced 
greater variations in factor loadings than can 
be observed here. This argues that we have 
identified some dimensions which are reason- 
ably common, at least within the American 
economic structure. 

Somewhat more disappointing is the fact 
that establishment factors have few person- 
ality loadings, and personality factors have 
few substantial loadings for establishment 
variables. Nevertheless, such findings are not 
entirely absent. In the management matrix, 
there are three personality factors (friendli- 
ness, ascendance, and masculinity) which 
have plausible loadings on _ establishment 





PERSONALITY VARIABLES IN UNION-MANAGEMENT RELATIONS 357 


variables such as conceding, satisfaction with 
union influence, and consulting. There are 
also three establishment factors (union 
achievement, pressure, and legalism) which 
have at least one personality variable 
plausibly associated. 

In the union data we find likewise three 
personality factors (friendliness, ascendance, 
and restraint) each of which has at least one 
establishment variable plausibly correlated 
with it; and four establishment factors (pres- 
sure, union achievement, depth of union in- 
fluence, and emotional tone) which have 
plausible personality correlates. 

It thus appears that, when the wider 
socioeconomic variables are controlled as in 
this study, some personality variables can 
be shown to be involved in meaningful ways 
with aspect of the union-management inter- 
action. One may suspect that the role of the 
personality of the manager or of the steward 
is not great, if such elaborate procedures are 
required to reveal it. This criticism tends to 
miss the point. The observer of union-manage- 
ment affairs can, on a purely subjective basis, 
identify the role of an outstanding leader such 
as a John L. Lewis or a Henry Ford, Sr. The 
elaborate procedural apparatus is necessary 
to demonstrate, objectively and with statis- 
tical props, that these influences are operative 
at the shop level and in a logically plausible 
fashion. 

Other findings. As noted above, we had 
expected the emotional tone variable to have 
significant personality correlates. To our sur- 
prise, a series of ¢ tests on the G-Z scores for 
managers and stewards yielded only one 
difference reaching the 5% level: shops with 
low emotional tone tend to have stewards 


who are sensitive and thin-skinned (this 

confirms the observation on Factor U-IX.) 

In the ratings of union man with whom 
you can work best, etc., the data indicate that 
respondents on each side agree on the at- 
tributes of the “best” people on either side, 
and that these idealized persons are very 
similar. However, in rating the “worst” person 
on one’s own side, every respondent clung 
close to the neutral point on the rating scale. 
Ratings of the worst opponent thus gave the 
widest variation. Both sides, however, tended 
to agree that this worst individual was un- 
friendly, uncooperative, excitable, hardhearted, 
and quick-tempered. Certainly anyone per- 
ceived as having such attributes would not 
be a preferred co-worker! 

REFERENCES 

Derser, M., CuHatMers, W. E., & Stacner, R. The 
local union-management relationship. Champaign: 
Institute of Labor and Industrial Relations, Uni- 
versity of Illinois, 1960. 

Frepier, F. E. Leader attitudes end group effective- 
ness. Urbana: Univer. Illinois Press, 1958. 

Hareison, F. H., & Coreman, J. R. Goals and 
strategy in collective bargaining. New York: 
Harper, 1951. 

Katser, H. F. Computer program for Varimax 
rotation in factor analysis. Educ. psychol. Measmt., 
1959, 19, 413-420. 

Kornuauser, A. (Ed.) Psychology of union-manage- 
ment relations; Champaign, IIl.: Industrial Rela- 
tions Research Association, 1949. 

SELEKMAN, B. M. Labor relations and human rela- 
tions. New York: McGraw-Hill, 1947. 

STAGNER, R. Psychology of industrial conflict. New 
York: Wiley, 1956. 

Stacner, R., Cuatmers, W. E., & Derser, M. 
The dimensionality of union-management relations 
at the local level. J. appl. Psychol., 1959, 43, 1-7. 


(Received September 28, 1961) 





Journal of Applied Psychology 
1962, Vol. 46, No. 5, 358-360 


IDENTIFICATION OF COLA BEVERAGES 


FREDERICK J. THUMIN ! 


Washington University 


An attempt was made to overcome certain methodological inadequacies of 
earlier studies in determining whether cola beverages can be identified on the 
basis of taste. Some 79 Ss completed questionnaires on their cola drinking 
habits and brand preferences, then were tested individually on samples of 
cola beverages presented under methods of paired comparisons. Significant 
chi square values were obtained for Coca Cola and Pepsi Cola, due to the 
large number of correct identifications for these brands. Correct identification 
of Royal Crown, however, did not differ from chance expectancy. No significant 
relationship was found between ability to identify cola beverages and degree 
of cola consumption; nor were Ss any better at identifying their “regular” 


brand than they were other brands. 


Earlier studies attempting to determine 
whether cola beverages can be identified on 
the basis of taste* have, in the main, ob- 
tained negative results (Bowles & Pronko, 
1948; Pronko & Bowles, 1948; Pronko & 
Bowles, 1949; Pronko & Herman, 1950; 
Prothro, 1953). These results may, in part, 
be attributed to certain methodological dif- 
ficulties. For example, in the majority of 
these studies, the subjects were not informed 
as to what brands they were attempting to 
identify. This lack of restriction encouraged 
guessing behavior, which resulted in the 
naming of irrelevant beverages (e.g., Dr. 
Pepper), as well as relatively frequent men- 
tions of the more heavily advertised brands 
such as Coca Cola. 

Moreover, the subjects were expected to 
identify the various colas on the basis of past 
experience, yet apparently no attempt was 
made to determine whether the subjects had 
ever tasted these beverages, or to relate 
identification to degree of cola consumption. 

Each of these previous studies used es- 
sentially the same method of stimulus presen- 
tation; namely, all beverages were presented 
simultaneously to the subject, and only one 
such presentation was made. This technique, 
while satisfactory, would appear to be some- 
what less sensitive than the method of paired 


comparisons, which requires the subject to 


1 The author wishes to express his appreciation to 
A. Barclay who served as 


drafts of this paper 


critical reader for earlier 


2In this report, the word “taste” is used in the 


| to include gustation faction, and 


possible tactual qualities as well 


road sense: i.e., 


identify each brand a number of times under 
various experimental conditions. 

Thus, the purpose of the present study was 
to determine whether methodological inade- 
quacies in the earlier studies may have con- 
tributed to the subjects’ relative inability to 
identify brands. The primary modifications 
in experimental design were as follows: an 
indication of cola consumption habits was 
obtained, subjects were told in advance 
what beverages they were attempting to 
identify, and the method of paired compari- 
sons was used for presentation of stimuli. 


METHOD 


Seventy-nine subjects were employed, all of whom 
were either college students or graduates 
between the ages of 18 and 37 The subjects 
were first asked to fill out a questionnaire on their 
cola consumption habits and brand preferences. The 
cola beverages were presented to the subjects indi- 
vidually in an experimental room which was kept 
dimly lighted to eliminate 
Instructions were as follows: 


college 


years 


possible visual cues 


I would like to have you taste and identify 
some cola drinks. I will place two cups at a time 
in front of you—one on your left 
your right. Taste these two colas in any 
you wish; then tell me what think 
each one is. Be careful not to change the position 
of the cups while you are tasting them; that is 
keep the left cup on the left, and the right cup 
on the right. Each time you finish with one pair 
of cups, rinse your mouth well by taking a few 
swallows of water from the water cup. When you 
have done this, I will give you the next pair 
There are three colas involved in this study 
Coca Cola, Pepsi Cola, and Royal Crown. Even 
if you are not the brand in some cases, 
I still want you to tell me what brand you think 


and: one on 
order 


brand you 


sure ol 





IDENTIFICATION OF COLA BEVERAGES 


it is. The two members of a pair are always dif 
ferent brands; that is, a brand is never compared 
with itself. Are there any questions? 


Using the method of paired comparisons, six pairs 
of beverages were presented to the subject, one pair 
at a time. The subjects were exposed to each brand 
four times for a total of 12 judgments. The order 
of presentation of stimulus pairs was randomly 
determined. Stimulus cups contained 2 
the beverage at an approximate temperature of 5° 
centigrade 


ounces of 


RESULTS 


The chi square used to determine 
whether ability to identify brands differed 
significantly from chance expectancy. As 
Table 1 shows, the chi square values for both 
Coca Cola and Pepsi Cola were significant 
at the .01 level of confidence, while that for 
Royal Crown was not significant. Inspection 
of the data indicates that the significant 
divergencies obtained with Cola and 
Pepsi Cola are due to the large number of 
correct identifications of these brands; for 
example, more than twice the expected num- 
ber of subjects were able to identify these 
brands correctly at least three times out of 
four. 


was 


Coca 


The results presented in Table 2 indicate 
that ability to identify cola beverages cor- 
rectly was unrelated to degree of consump- 
tion; i.e., correct 
tially the same for heavy, medium, and light 
cola drinkers. Further analysis of the data 
showed that ability to identify a given brand 


identifications were essen- 


TABLE 1 


Cut SQUARE FOR OBSERVED AND EXPECTED FRI 
QUENCIES OF BRAND IDENTIFICATION 


Observed 
and Number of correct 
expected identifications 
Brand of fre 


cola quencies 0 


Coca Cola 
Pepsi Cola 
Royal Crown 


All brands 


PABLE 2 


Cut SQUARE FOR BRAND IDENTIFICATION 
RELATED TO CONSUMPTION 


Number of colas Number of correct identifications 


consumed per 
week 0-3 4-6 


Heavy 10 14 
(7 or more) (8.5 

Medium 7 
(3-6) (8. 


Light s 


was also unrelated to whether that brand was 
considered by the subject to be his “regular” 
brand. 

By telling the subjects in advance what 
brands they were attempting to identify, ir- 
relevant brand naming was eliminated as well 
naming of heavily advertised 
brands. Specifically, Coca Cola was men- 
tioned 317 times, Pepsi Cola 321 times, and 
Royal Crown 310 times. 


as excessive 


DISCUSSION 


The present study clearly demonstrated 
that certain brands of cola can be identified 
on the The significant chi 
square values obtained with Coca Cola and 
Pepsi Cola were due to the large number of 
correct identifications for these brands. The 
subjects’ inability to identify Royal Crown 
Cola can probably be attributed to a lack 
of recent experience with this brand. Some 
58% of the subjects said they had not had 


basis of taste 


a Royal Crown for at least 6 months prior 
to the experiment. 
No relationship was found between ability 


4, 
to identify cola beverages and degree of cola 


consumption (i.e., number of colas consumed 
in an average week). Moreover, the subjects 
were no better at identifying their regular 
brand than they were at identifying other 
brands. Thus, it would appear that the sub- 
jects needed a certain minimal amount of 
recent experience with a brand in order to 
identify it, but beyond this minimal amount, 





360 FREDERICK 


additional experience (i.e., heavier consump- 
tion) did not help. 

Within the framework of this study, the 
method of paired comparisons proved to be 
sufficiently sensitive to detect small, but sig- 
nificant abilities to identify cola beverages. 
There appeared to be no problem with the 
development of sensory adaptation as suc- 
cessive pairs of stimuli were presented. 
Analysis of the data revealed that, as trials 
progressed, the subjects small 
(though nonsignificant) increases in ability 
to identify brands. 


showed 


J. TauMIN 


REFERENCES 

Bow es, J. W., Jr., & Pronxo, N. H. Identification 
of cola beverages: II. A further study. J. appl. 
Psychol., 1948, 32, 559-564. 

Pronko, N. H., & Bowtes, J. W., Jr. Identifica- 
tion of cola beverages: I. First study. J. appl. 
Psychol., 1948, 32, 304-312. 

Pronko, N. H., & Bowtes, J. W., Jr. Identifica- 
tion of cola beverages: III. A final study. J. appl. 
Psychol., 1949, 33, 605-608. 

Pronko, N. H., & Herman, D. T. Identification of 
cola beverages: IV. Postscript. J. appl. Psychol, 
1950, 34, 68-69. 

ProtHro, E. T. Identification of cola beverages 
overseas. J. appl. Psychol., 1953, 37, 494-495. 


77 
Zi, 


(Received October 1961) 





Journal of Applied Psychology 
1962, Vol. 46, No. 5, 361-364 


MOOD CHANGES 


DURING A 


MANAGEMENT 


TRAINING LABORATORY 


BERNARD M. BASS 


Graduate School of Business, University of Pittsburgh 


30 trainees completed a mood adjective check list at 5 periods during a 10-day 


sensitivity training laboratory 


for management 


Results indicated that dif- 


ferences and predictable shifts in mood do occur. Specifically, skepticism and 
anxiety decreased as most trainers would expect; other moods, like depression 
and aggression, showed the effects of particular training laboratory procedures. 
The results also provided independent evidence of the validity of the factored 
Nowlis and Green check list of mood adjectives. Findings generally were 
consistent with trainer beliefs about mood changes in trainees and conformed 
to expectations about the effects on mood of defeat or victory in intergroup 


competition. 


The increasing interest in sensitivity train- 
ing for management is illustrated by Tan- 
nenbaum, Weschler, and Massarik (1961) who 
build the first part of their recent book on 
leadership and organization around the need 
of managers for sensitivity. Then they proceed 
to examine how sensitivity is learned. A 
rationale for relating sensitivity to successful 
leadership is provided by Bass (1960, p. 167). 
Yet published objective evaluations of what 


happens to supervisors during and after train- 
ing are scarce. In a previous article (Bass, 
1962b), data were presented showing that 
trainees do seem to increase their sensitivity 
to interpersonal phenomena, as measured by 


reactions to the film Twelve 
before and after training. 

Sensitivity training has been described as 
“outs” learning—learning at an emotional 
level about group dynamics, realized intel- 
lectually, but not fully accepted by trainees. 
Despite considerable experience and much 
commentary about how feelings are shifted 
during 10 days or 2 weeks of sensitivity train- 
ing, relatively little objective information is 
available. 

The present paper reports the results of 
sampling trainee moods at various points dur- 
ing the course of a 10-day sensitivity training 
laboratory. 


Angry Men 


METHOD 


Green and Nowlis (1957) developed, factored, 
and validated a mood adjective check list. The 
subjects’ responses to the check list were affected by 
viewing various kinds of films having emotional im- 


361 


pact as well as by experiencing contrived frustrations. 
Eight factors could account for the common variance 
in responding to 110 mood adjectives (Green & 
Nowlis, 1957). In the present study, the eight 
factors were assessed using 27 adjectives from the 
original list. Each selected adjective correlated higher 
with one but not the other factor. An exception 
was made. The adjectives, “suspicious” and “skep- 
tical,” included in Green and Nowlis’ aggression factor 
were treated as a ninth factor, skepticism; for it 
was guessed that the “suspiciousness” component of 
aggression would be affected more by the laboratory 
experience than the “rebelliousness’” or “boldness” 
aspect.? 

The factors examined by determining the mean 
response to all mood adjectives representing a factor 
were as follows: 


MOOD FACTORS MOOD ADJECTIVES 


A. Concentration concentrating, serious, earnest, 
engaged in thought 

angry, bold, defiant, rebellious 
pleased, elated, lighthearted 
energetic 

boastful, self-centered, egotistic 
forgiving, kindly, warmhearted 
blue, lonely, regretful, insecure 
anxious, clutched up, fearful 
skeptical, suspicious 


B. Aggression 

C. Pleasantness 
D. Activation 

E. Egotism 

F. Social Affection 
G. Depression 

H. Anxiety 

I. Skepticism 


The subjects responded by inserting each of the 27 
words in the appropriate blank of the following scale: 


. I definitely fee] ___-____ at this moment. 
. I feel slightly —— at this moment. 
. I cannot decide whether or not I am —~— 

at this moment. 


1. I am sure I am not- at this moment. 


1 Subsequent private conversations with V. Nowlis 
revealed that recent refactorizations by him have 
yielded separate skepticism and aggression factors. 





362 BERNARD 


Thirty supervisors, 
from the same petrochemical refinery, completed 
mood check lists at five times during a 10-day 
laboratory: (a) at the beginning; (6) on the third 
morning just prior to an exciting 24-hour inter- 
group competition; (c) the next day at the end of 
the competition; (d) upon return from a weekend 
holiday lasting from Saturday noon to Sunday night; 
and, (e) at the end of the tenth day, the close of 
the laboratory. 


engineers and administrators 


RESULTS 
Stable Mood Differences in Level 


Inspection of mean responses in Figure 1 
revealed clear and consistent differences in 
reporting experiencing of the nine moods. 
Concentration, activation, social affection, and 
pleasantness remained during all periods be- 
tween means of 2.3 and 3.3; anxiety was 
less frequently experienced; while egotism, 
aggression, and depression never rose above 
1.7. 

No analysis was made of these absolute 
differences since the three levels were obvi- 
ous; * rather, repeated measurements analyses 
were run of each mood factor to see whether 
the shifts during the laboratory were sig- 
nificant.® 


Mood Shifts 


Concentration and declined  signifi- 
cantly at the 1% level of confidence according 
to the appropriate F test. Likewise, the steady 
decline in skepticism was significant at the 
1% level. Significant fluctuations at the 5% 
level of confidence were obtained for activa- 
tion and depression, while the overall fluctua- 
tions of the other five factors by all trainees 
combined could be accounted for by chance. 


rose 


Absolute versus Relative Interpretation 


A folklore has grown among management 
laboratory trainers. Our results tend to sup- 
port some, but not all of these beliefs. For 
example, much is made of the initial high 
level of anxiety most trainees are supposed to 
experience. Figure 1 suggests that the absolute 
anxiety level is not particularly high (as- 
suming the absolute scale is a valid indicator). 

2 Actually, these level differences might be due 
simply to the differential social desirability of the 
mood factors and their associated adjectives. 

Analyses were designed and executed by George 
Dunteman 


M. Bass 


However, in line with trainer beliefs, what- 
ever initial anxiety occurred seems to have 
diminished considerably by the end of the 
laboratory. The mean drop from the first to 
the last session (2.3 to 1.8) was 11 times 
the standard error of the mean, estimated 
from the within-subjects variance in anxiety. 

Again, the absolute level of initial skepticism 
was not as high as trainers might have ex- 
pected, possibly because most trainees in this 
lab had already discussed participation with 
former trainees back at the plant; but the 
steady, significant, decline from a mean of 2.1 
to a mean of 1.3 is in accordance with trainer 
expectations. 

Coming back from a weekend rest affected 
trainees on Sunday night as trainers usually 
infer from observation: increased depression, 
reduced concentration, and activation. 


Victory, Defeat, and Mood 


Of particular interest was the impact of 
the intergroup competition near the end of 
the first week of the laboratory. As Blake and 
Mouton (1961) note, the intergroup is an 
involving experience where matched training 
groups suddenly are pitted against each other 
ostensibly to see which is the best group. 
Previously free of any assigned tasks, each 
group must now produce an essay, then try 
to convince the opposing training group of 
its superiority. Despite much debate and 
balloting, each group continues to favor its 
own product. Only when impartial judges are 

Definitely 


Feel 40 
This Wo 


35 


AT MOMENT 


Definitely 


Not 








mday 
Night 


After 


Thursday 
Noon 

End of 
Lab 


Noon 
After 
Competitor 


Competitor Hobday 


TIME 


1. Changes in mood during course 
of the laboratory 





Moop CHANGES DURING A MANAGEMENT TRAINING PROGRAM 


TABLE 1 - 


SIGNIFICANT MEAN CHANGES IN Moops as A CoN 
SEQUENCE OF VICTORY AND DEFEAT FOR 15 WINNERS 


AND 15 LosERs 


Before After 
compe victory 
tition or 

begar defeat 


Mood Difference 


Concentration 


Winners 16 


3, 
Lose rs 2.92 


Aggression 
Winners 


Losers 


Pleasantness 


Winners 


Losers 


I Jepressic mn 


Winners 


Losers 


Skepticism 
Winners 


Losers 


brought in, is a clear victory experienced 
by one group and clear defeat by another. 

As seen in Figure 1, conforming to trainer 
observations, the intergroup competition suc- 
ceeded in increasing concentration and activa- 
tion. Social affection was at a low point at 
the end of the struggle and aggression at its 
high. 

A more detailed analysis of variance of the 
shift from before-to-after the intergroup com- 
petition by winners and losers showed sig- 
nificantly different patterns of shift in ag- 
gression, depression, and pleasantness. Win- 
ners or losers also shifted differentially in 
skepticism and concentration. 

Table 1 shows the significantly different (at 
the 1% level) patterns of shifting in aggres- 
sion by winners and losers. The results sup- 
port the frustration-aggression hypothesis, but 
mainly in one direction. Winning does not 
seem to reduce reported feelings of aggression 
anywhere as much as losing increaSes feelings 
of aggression (F = 5.8) 

The pattern of change in depression and 


363 


pleasantness shown in Table 1 reflects the 
involvement of trainees in winning or losing 
in group competition and corroborates Blake 
and Mouton’s (1961) observations that 
training groups suffering defeat do a great 
deal of soul-searching while victory makes 
groups “fat and happy, content to rest on 
their laurels.”” The interaction patterns for 
depression and pleasantness yielded high sig- 
nificant Fs of 17.5 and 20.0 when the vari- 
ance due to before versus after means was 
contrasted with the variance within subjects. 

Concentration significantly increased, but 
only for losers, while skepticism significantly 
decreased (at the 1% level) from before to 
after the competition for both winners and 
losers, but particularly (and significantly) for 
winners. These shifts conform to observers’ 
impressions that trainees value highly the 
intergroup experience as a means of learning 
about the dynamics of groups in conflict; at 
the end, losers feeling a greater 
effort; winners feeling less skepticism about 
laboratory procedures 


sense of 


Ambiguous Adjectives 


Only one factor, egotism, of the nine scored, 
failed to fluctuate from one point in the 
laboratory to another. It showed no particular 
change as a consequence of initial impact of 
the laboratory, success or failure in competi- 
tion, “Sunday night blues’ nor was there any 
end spurt which could not be accounted for 
by chance fluctuations. Observation of the 
error variance suggests the possibility that the 
adjectives (perhaps mainly and 
“self-centered’”’) yielding the egotism score 


“egotistic”’ 


were ambiguous terms for these subjects 
Actually a highly significant F of 13.6 was 
obtained for fluctuations in egotism when the 
orientation of the subjects was taken into ac- 
count by classifying them into three equal 
groups according to their scores on the Orien- 
tation Inventory (Bass, 1962a). Self-oriented 
and task-oriented subjects declined in egotism 
as a consequence of defeat as might have been 
expected for all subjects, but 
oriented subjects did not—they increased in 
egotism. The failure of any other mood 
factors to show the pattern suggests that the 
group of interaction-oriented subjects, com- 


interaction- 





364 BERNARD 


posed mainly of less educated, lower echelon 


supervisors, misunderstood the meaning of 


the egotism adjectives.‘ 

*Nowlis confirms the need to guard against 
vocabulary problems when employing the check list 
with subjects lacking in high school education. 


REFERENCES 
Bass, B. M. Leadership, psychology and organiza- 
tional behavior. New York: Harper, 1960. 


Bass, B. M. The Orientation Inventory. Palo Alto: 
Consulting Psychologists, 1962. (a) 


M. Bass 


Bass, B. M. Reactions to Twelve Angry Men as a 
measure of sensitivity training. J. appl. Psychol, 
1962, 46, 120-124. (b) 

BLAKE, R., & Mouton, J. S. Competition, communi 
cation and conformity. In I. A. Berg and B. M 
Bass (Eds.), Conformity and deviation. New York 
Harper, 1961. 

Green, R. F., & Now is, V. A factor analytic study 
of the domain of mood with independent valida- 
tions of the factors. Technical Report No. 4, 
1957, University of Rochester, Contract Nonr 
668(12). 

TANNENBAUM, R., Wescuter, I. R., & MAssarix, F 
Leadership and organization. New York: McGraw- 
Hill, 1961. 

(November 13, 1961) 





Journal of 


1962, Vol 


ipplied Psychology 
46, No. 5, 365-369 


THE RELATIVE IMPORTANCE OF VISUAL AND AUDITORY 
FEEDBACK IN SPEED TYPEWRITING 


M. JOAN DIEHL! ann R. SEIBEL 


International Business Machine 


Research Center, Yorkiown Heights, New York 


16 skilled typists took speed typing tests under 4 different conditions: (a) 


normal speed typing conditions; (b) 


line; (c) 


normal, but 
normal, but the sound of the typewriter was masked by noise 


no sight of the printed 


fed through earphones worn by the typists; and (d) a combination of Con- 
ditions b and c (neither visual nor auditory feedback). The effects of these 
conditions were measured in terms of speed and accuracy of typing. Some 
differences among the conditions may be considered statistically significant 
(p S .05 on a per comparison basis), but the small magnitudes of the differences 
suggest: in speed typing situations the presence or absence of visual and/or 
auditory feedback has a relatively unimportant effect on speed and accuracy 


of typing. 


A review of the abundant published litera- 
ture on typewriting reveals that the relative 
importance of visual (sight of the printed 
line) and auditory (sound of the printing 
action) feedback in typewriting has not been 
experimentally explored. The electric type- 
writer makes possible the separation of key- 
board and printing mechanism, and certain 
data processing systems may call for this 
separation to be one of many feet, or even 


miles. An interest in quieter offices may also 
call for the removal of the noise producing 


printing mechanism to some remote and 
sound isolated location. Thus, it now becomes 
important to ascertain the effects of remov- 
ing the visual and auditory feedback. 

The auditory feedback has always been 
available to the typist, and it seems reason- 
able to suppose that the typist utilizes the 
auditory cues to help her maintain rhythm 
and detect certain kinds of mistakes. Visual 
feedback has been available to the typist since 
prior to the turn of the century, and the in- 
vention of the typewriter which permitted the 
typist to see the line of type as it was being 
generated has been heralded as one of the 
greatest advances in the history of the type- 
writer. Since history, tradition, and habit 
favor the retention of these forms of feedback, 
it behooves one to understand their effects 
before one proposes a system which deprives 
the operator of one or both of these types of 


cues. 


Now at Columbia University 


The present experiment attempts to deter- 
mine the effects on speed typing of depriving 
skilled operators of visual and/or auditory 
feedback. The effects on other typing jobs, 
unskilled typists, attitudes, morale, and other 
such factors are beyond the scope of this 
experiment, and generalizations from the 
present results do not seem warranted. Fur- 
ther experimentation is needed in order to 
understand these additional aspects of the 
situation 


METHOD 
Apparatus 


An IBM Executive, Model B, typewriter (Serial 
Number 160779) was used. It was equipped with a 
16-inch carriage, standard executive keyboard, and a 
Number 2 platen. 

Four experimental conditions were employed: no 
visual feedback, no auditory feedback, neither visual 
nor auditory feedback, and normal typing 

Masking a visual feedback was achieved with 
a piece of cardboard approximately 31 inches long 
and 18 inches wide, placed so as to prevent the 
typist from viewing the typewriter carriage and 
typewritten sheet, though not interfering with sight 
of the typewriter keyboard. 

Masking of auditory feedback was achieved by 
means of a masking noise presented through a 
double earphone headset (Model Trimm Receiver 
ANB-H-1) equipped with sponge rubber ear cushions 
(Military Type MC-114). The noise was an open 
field monaural tape recording (using an Ampex, 
Model 350, tape recorder with an Altec, Model 
633A, dynamic type microphone) of four typists 
typing simultaneously on four IBM electric type- 
writers. The intensity level of the earphone signal 
was sufficiently high so that it was subjectively 
judged by all subjects (Ss) to adequately mask the 





M. JoAN DIEHL 


AND R. SEIBEI 


TABLE 1 


[EXPERIMENTAL 


Group 


Visual 

Visual and auditory 
Normal 

\uditory 


Visual and 
Visual 
Auditory 


Normal 


Subjects typed und 


sound of the key strokes and the bell of the experi- 
mental typewriter when it was in operation. The 
masking noise was read from the tape via a Tape-O- 
Matic, Model 711, tape recorder, and the 
the headset produced an average needle deflection 
of 4 volts (plus or minus 1 volt) when measured 
with a 1,000-ohm-per-volt (Simpson Model 260) 
voltmeter. An end-of-line cue, normally sup- 
plied by the bell, was provided by turning off the 
masking noise for an interval of 10 single unit spaces 
(three average letters on the IBM Executive) when 
the typist reached a point near the right hand margin. 
The start of this point coincided with the point at 
which the bell rang 

Masking of visual and auditory 
accomplished by simply combining 
dividual masking techniques 

For all conditions, 230 single 
average letters) allowed for 
line. The end-of-line cue 
occurred 
the 
beyond 


signal at 


ac 


feedback was 
the two in- 
(77 
typewritten 
(bell, or cessation of noise) 


unit 
eac h 


spaces 
were 


be fore 


Ty ping 


strokes (10 average letters) 
limits for a line 
these limits (the margin 
not stop the typewriter) an error. 
Timing of all tests was accomplished via a Standard 
Electric timer, Type S-6 
curate within a 
mental conditions 
Twenty-four different standardized 5-minute speed 
typing used for the test material. These 
tests were published by Today’s Secretary. In a 
pilot study IBM 
differences were found between 
hence, the tests were randomly 
experimental conditions 


30 single 
reached 
hand 


space were 


right does 
was scored as 
and was 

tor 


generally 
the 


ac- 


second or two experi- 


tests were 
obvious 

tests; 
over all 


secretaries, no 
indivicual 
distributed 


using two 


Procedure 


IBM secretaries, 
phers or typists from two different IBM 
Their mean typing speeds on an IBM 
typewriter under normal typing 


The Ss were 16 female stenogra- 
locations. 
Executive 


speed conditions 


2 This is a monthly publication and each issue 
contains one test entitled “Competent Typist Test 
The authors of tests from 
month, the pages on which they appear 
24 tests for 1956 and 1957 the 
this 


month to 
The 


used in 


these vary 
as do 
are tests 


study 


iuditory 


DESIGN 


Order 


Normal 
Auditory 


Auditory 
Normal 
Visual 


Visual and auditory 


Visual and auditory 


Visual 


ranged from 45 to 80 net (number gross words minus 
2 times number of errors) words per 
(NWPM). These 16 Ss were assigned at 
into four equal groups of four Ss each 

Prior to the first experimental 
was given, under normal typewriting conditions, 
four 5-minute speed typing similar to those 
used in the experiment proper. On her first day of 
testing, and again on her day of testing, 
each S typed under all four experimental conditions 
normal typewriting (N), visual (V), 
auditory masking (A), and visual and auditory 
masking combined (VA). Each of the groups of 
Ss proceeded through the four experimental condi 
tions in a different with experimental con 
ditions counterbalanced for order. The experimental 
design, adapted from Lindquist (1953), is outlined in 
Table 1. The four orders selected (out of the pos- 
sible 24) were that only one 
involving A or occurred during each 
day’s run. A run consisted of 12 
tests, 3 for each condition. Rests between 
within conditions, lasted 3 minutes. Rests between 
conditions lasted 5-15 minutes. At least 24 hours, 
and no more than weeks, elapsed between Days 
1 and 2. One-half of the 24 Competent Typist Tests 
were randomly distributed the Day 1 runs and 
the other half the Day 2 runs. Thus, each S 
took a total of six 5-minute speed tests under each 
of the four experimental conditions. Since all 16 Ss 
took all four experimental conditions, each condi- 
tion is evaluated in terms of the results of 8 hours 
of speed typing tests 

Prior to the initial Conditions A 
or VA, each S was given approximately 5 minutes 
of practice under A or VA conditions. Thus, Ss 
in Groups 2 and 4 received 5 minutes of practice 
under A or VA conditions prior to their first 
experimental typing The Ss in Groups 
1 and 3 were each given 5 minutes of practice under 
N conditions prior to their first experimental typing 
condition, as well as 5 minutes of practice under A 
or VA conditioning prior to their experi 
mental condition 


minute 
random 


session each §S 
tests 


second 


masking 


order 


condition 
half of a 
5-minute 
tests, but 


chosen so 
VA 


day’s 


for 
lor 


introduction of 


condition 


second 


3efore beginning to actually typewrite, each S was 
instructed to between lines and double 
space between paragraphs on each test, to listen for 
the appropriate cue change of line, to 


single space 


for and 





VISUAL AND AUDITORY FEEDBACK IN SPEEDWRITING 


regulate her typing speed so that she 


would be 
making approximately five errors per 5-minute test 
When the appropriate experimental condition was 
set up, each S was comfortably seated at the type- 
writer with the appropriate test placed in a con- 
venient position. The S was then given a “ready” 
signal and then a “start” signal by the experimenter 
(E). At the end of 5 minutes, S was instructed 
to “stop.” The E immediately corrected the test, 
informed S of her error rate, and instructed her 
to aim for either speed or accuracy in the following 
test. If, for example, S made one or two errors in 
a 5-minute test, E instructed her to aim for speed 
in the following test. If S made 10-15 errors in a 
test, she was instructed by E to reduce her gross 
words per minute (GWPM) and aim for accuracy 
in the next test. When S made approximately five 
errors in a 5-minute test, E instructed S to main- 
tain her present speed and error rate in the follow- 
ing test. While S took the next test, E computed 
the NWPM of the preceding test and S was given 
feedback of her error rate, NWPM, and GWPM 
after the completion of the ongoing test 


RESULTS 


For every error made in each 5-minute 
test, two words per minute were subtracted 
from the GWPM, resulting in the NWPM 


TABLE 2 
ANALYSIS OF VARIANCE FOR THE GROSS 
Worps PER Minute (GWPM) 


Source df WS 


Between subjects 5,181.0 
Conditions *K Order 

(C XK O) (between 1,466.3 
Error (E) (between) 13,714.7 
Within subjects 3,126.3 
Feedback conditions 22.9 
Order (O) 
Practice (P 
C X O (within 
CcxP 
Ox P 
CHUA? 

(between 

‘xX O X P (within 

E (within) 

E, (within) 

Es (within 

E; (within 


Total 


< .05 


TABLE 3 
ANALYSIS OF VARIANCE FOR THE NET Worp 
MinuTE (NWPM) Typinc Scores 


Source 


Between subjects 
Conditions K Order 
(C XO) (between 
Error (E) (between) 
Within subjects 
Feedback conditions 
Order (O) 
Practice (P 
C X O (within 
CAF 
Ox F 
cCxOXP 
(between ) 


C XO X P (within 2,311.! 
E (within) 20,799.6 
E, (within) 3,294.9 

4.406 

13.098 .3 


57,902 


E» (within 
E; (within) 
Total 


Note 
of errors, 
*p <.05 


NWPM equals GWPM 1 


score. A penalty of one error was given for 


each of the following: every misspelled word 
(although no more than one error was given 
per misspelled word), every word or punctua- 
tion omitted or committed, neglecting to 
indent at the beginning of a paragraph, failure 
to single space between lines or double space 
between paragraphs, neglecting to single space 
between words or double space between sen- 
tences, false anticipation of a cue for change 
of line, and typing beyond the right hand 
margin. 

Analyses of variance (see Lindquist, 1953 
pp. 297-301) for GWPM and NWPM ar 
presented in Tables 2 and 3, respectively 
The effects of Feedback conditions 
significant at the .05 level. The 
Practice during the experiment, and Order, 
and all interaction effects were not significant 
Figures 1 and 2 give the means for the four 
Feedback conditions for the GWPM and 
NWPM< scores for each S$ (identified by the 
numbers 1 through 16), for group means 
(identified by AVER). and the grand means 


were 
effects of 





M. JoAN DIEHL AND R. SEIBEL 





Fic. 1. Means of 
Numbers 1 
for each of the 


S1X 
16 
groups 


by through 


four identified by AVER 

Examination of S$ means and group means 
reveals no obviously consistent 
among the four Feedback 
either GWPM or NWPM scores. However, 
the grand means for both GWPM _ and 
NWPM show that performance under the 
conditions which 


differences 
conditions, for 


employed earphones was 


. Fic. 2. Means of six 5 
Numbers 1 through 16—under each of the four 


for each of the AVER 


minute 


four groups—identified by 


5-minute-Gross-Words-Per-Minute 
under each of the four typing conditions 


Vet-Words-Pet 


scores for each subject—identified 


(Also shown 
and the four grand means.) 


are the means 


always somewhat poorer than under N or V 
masking, though not all of the comparisons 
involved were statistically significant. 
Utilizing multiple ¢ comparisons, with mean 
differences tested against error terms based 
on the Mean squares for 


Error; (within) 


the significance of all pair-wise comparisons 


Minute 
typing conditions 


identified by 
the 


each 
(Also 


grand means.) 


scores for subject 


shown are means 


and the four 





VISUAL AND AUDITORY 


evaluated. For the GWPM scores the 
between N and A and between 
V and A conditions are significant at the .05 
and .01 levels, respectively. The other four 
possible comparisons are not significant at the 
.O5 level. The four means are: N, 72.92; V, 
3.42: A, 71.87; and VA, 72.61 GWPM. 
For the NWPM scores the N condition differs 
from both conditions involving A and VA at 
the .05 level of confidence. The other four 
possible comparisons are not significant at 
the .05 level. The four means are N, 58.97; 
V, 57.52; A, 55.25; and VA, 55.73 NWPM. 
Since all six possible comparisons were made 
for the four means (for each score), the 
probability of a Type 1 error for the sets of 
comparisons is actually greater than .05 (or 
01) by a factor of six. 


was 
differences 


DISCUSSION 


The lack of consistency in the data, as 
depicted in Figures 1 and 2, suggests that 
even though performance under some of the 
experimental conditions may be considered 
significantly different than that under others, 
these differences should be interpreted with 
caution. Though each experimental condition 
evaluated with a total of 8 hours of 
speed typing tests, it must be kept in mind 
that each S typed for just half an hour under 
each of the conditions; and the masking con- 
ditions were quite new and novel to all Ss. 

It should be pointed out that the effects of 
diminished feedback or the introduction of 
“novelty” into the N situation might provoke 
a significant morale and attitude problem. 
Initially, most of the typists and secretaries 
resisted participation in the experiment be- 
cause of the general feeling that they “‘could 
not possibly type well in presence of the noise 
used to mask the auditory cues from the 
typewriter, and that they would type off the 
page if they could not glance once in a while 
at the typewritten sheet.’ In order to over- 
come this negative attitude, preliminary ses- 
sions were held during which each S typed, 


was 


FEEDBACK 


IN SPEEDWRITING 369 


under N conditions, four 5-minute standard 
ized typing tests similar to the ones used in 
the experiment proper. During this time it 
was possible for E to establish rapport with 
Ss and to allay their apprehension concerning 
the novel typing conditions, thus changing 
their attitude from one of resistance to one 
of cooperation. 

The majority of studies of typewriting to 
date have either 
championship typists; only a relatively small 
number of studies have been concerned with 
that in-between group, the skilled operators 
usually found in the business world, operators 
whose speed ranges between 45 and 80 
NWPM. The present experiment used this 
latter group since the results obtained with 
this group are of more practical interest if 
auditory and/or visual cues are to be elimi 
nated in the typewriters of the future 


used as Ss novice ol 


CONCLUSION 


The average number of GWPM typed was 
just over 70, and the largest difference be 
tween and another 
proximately 1.5 words per minute. 
average number of NWPM typed was be 
tween 55 and 60, and the largest difference 
between one condition and another was just 
under 4 words per minute. While statistical 
analyses indicated that some of the differences 
among the conditions may be considered 
statistically significant (at the .05 
better on a per comparison basis), the small 
magnitudes of the differences suggested the 
following conclusion: in a speed typing situa- 
tion the presence or absence of visual and/oi 
auditory feedback has a relatively unim 
portant effect on speed and accuracy of 
typing. 


one condition 


was ap- 


The 


level or 


REFERENCE 
Design and anal) 


and education 


LinpoutstT, E. F 
in psychology 
Houghton Mifflin, 195 

(Received 


December 1961) 





Journal of Applied Psychology 
1962, Vol. 46, No. 5, 370-374 


SOME DIFFERENTIAL 


EFFECTS OF RACE OF 


RATER 


AND RATEE ON EARLY PEER RATINGS OF 
COMBAT APTITUDE’ 


JOHN E. peEJUNG ? 


AnD HARRY KAPLAN 


Army Personnel Research Office, Department of the Army, Washington, D. C. 


Peer ratings of combat potential made by 669 Caucasian and Negro army 
recruits of their squad members were examined. The hypothesis that ratees 
would receive higher ratings from members of their own race than from 
members of another race was supported (p< .05) for all ratee samples. The 
hypothesis that raters would give higher ratings to men of their own race 
than to men of another race was supported for Negro raters but not for 
Caucasian raters. The obtained significant differences appeared basically a 
reflection of higher ratings received by Negroes from Negroes. For practical 
purposes the effect on the recruit’s average rating score was negligible because 
of the preponderance of Caucasian raters within squads exhibiting no rating 


preferences. 


The Human Factors Research Branch of 
The Adjutant General’s Office has recently 
developed an experimental combat rating 
scale designed to measure the Army recruit’s 
potential as a combat soldier. Assignments 
of recruits to squads within training com- 
panies for basic military training are made 
without reference to the assignee’s ethnic or 
racial background. Buddy ratings are com- 
puted as averages of all the ratings made of 
the recruit by his squad members. Although 
a working integration of varying ethnic social 
backgrounds has clearly been achieved within 
squads from a practical point of view, a 
closer examination of the intrasquad rating 
behaviors of squad members is pertinent 
both in assessing the validity of user accept- 
ance of the experimental rating instrument 
and in terms of the wider research question 
of operant race bias. 

The present inquiry into differential effects 
of race of rater and of ratee is limited to 
recruit populations of English speaking Cau- 
casians and Negroes. Two basic questions 
are posed for each racial grouping: Do ratees 
receive higher ratings from members of their 


\ portion of this paper was presented at the 
convention of the American Psychological Associa- 
tion, September 1960. The opinions ex- 
pressed in this paper are those of the authors. 
Publication does not imply Department of Defense 
indorsement of factual accuracy or opinion. 

2 Now with the Parsons Research Project, 
of Child Research, University of Kansas 


Chicago, 


3ureau 


own race than they do from members of 
another race? Do raters give higher ratings 
to members of their own race than they do 
to members of another race? 


PROCEDURE 


The combat aptitude rating scale was adminis 
tered to recruits in basic training at a midwestern 
Army installation as part of a larger coordinated 
research effort to improve initial selection of Army 
enlisted men for the Combat Arms. The rating 
procedure involved an alternating ranking—first 
best, then poorest, then second best, second poorest, 
etc.—followed by a rating using a seven-step scale 
with forced agreement between the rankings and the 
ratings. In each squad of about 16 recruits each 
man, excepting those absent from the rating session 
due to conflicting assignments, rated each other man 

The combat rating scale operationally provides 
a numeric statement of the ratee’s potential as a 
combat soldier as perceived by each of his fellow 
squad members. The recruit’s rating score is the 
average of all ratings he receives. Reliability esti 
mates in the .80s* have been reported ior these 
rating scores based on a 1,000-man recruit sample 
(Birnbaum, Rosenberg, & White, 1957a, 1957b; 
Willemin & Rosenberg, 1957; Willemin, Rosenberg, 
& White, 1957). Correlation coefficients between this 
fifth-week average rating and similar average ratings 
obtained 3 months and nearly 1 year later of .62 
and .57, respectively, suggest that these ratings re- 
flect a somewhat stable ratee behavioral component 
over raters and over time (Willemin & Karcher, 
1958). 

The analysis sample consisted of all English speak- 
ing Caucasian and Negro recruits receiving ratings 

3 These estimates were computed as 


between based on random 
raters and corrected for double length 


correlations 


subaverages halves of 


370 





EFFECTS OF RACE ON PEER RATINGS 


TABLE 1 


SUMMARY OF 


ANALYsIS OF AVERAGE RATING RECEIVED BY SAMI 


RATE! 


FROM CAUCASIAN AND FROM NEGRO RATERS 


Rater 


Sample size 
Average rating received from Caucasian raters 
Average rating received from Negro raters 


CR 


.05, one-tailed test. 
.O1, one-tailed test 


from both Caucasian and Negro raters in their fifth 
week of basic training. For each man, ratings re- 
ceived from Caucasian and Negro raters were aver- 
aged separately. Approximately half of the original 
1,300-man ratee sample had no Negro ratings and 
were dropped from this study. This restriction 
caused a considerably greater attrition of Negro than 
of Caucasian ratees since in many squads where 
there was only one Negro, that Negro was unusable 
as a ratee (he received no ratings from Negroes) 
while all of his Caucasian buddies were usable 
ratees. The reverse situation of one Caucasian per 
squad did not exist in our sample. The retained 
ratees were divided into four analysis samples on the 
basis of military component—Regular Army (RA) 
versus inductee (US)—and Caucasian versus 
Negro.* An examination by subsample of the Cau- 
casian ratings received by the unusable ratees and 
those received by the retained ratees revealed no 
mean differences between the unusable and retained 
groups (critical ratios (CRs) of .39, .33, .69, and 

69, for the RA Caucasian, US Caucasian, RA 
Negro, and US Negro comparisons, respectively). 


race, 


ANALYSIS AND RESULTS 


It was hypothesized that ratees would re- 
ceive higher ratings from members of their 
own race than from members of another race 
(i.e., a Caucasian ratee would receive higher 
ratings from his Caucasian raters than he 
would from his Negro raters, and a Negro 
ratee would receive higher ratings from his 
Negro raters than he would from his Cau- 
casian raters). Differences between average 
ratings received from raters of one’s own race 

4 The separation of enlisted and inducted recruits 
was maintained in all rating analyses due to ob- 
served differences between these two component 
groupings with respect to aptitude area scores and 
uch background 
education. 


variables as age and years of 


Caucasian 


Ratee sample 


RA US 


Negro Negro 


Caucasian 


229 38 370 
3.65 3.38 4.18 
3.51 4.29 4.01 
1 4.07** sn" 


and from raters of another race were exam- 
ined for the four-analysis ratee samples 
separately in terms of one-tailed CRs for dif- 
ferences between correlated means. In these 
analyses each ratee served as his own con- 
trol. The assumption of homogeneity of vari- 
ance was supported for all four samples; 
none of the F ratios was significant at the 
.10 level. 

Mean differences significant at the .05 
level were found for all four samples. The 
results of these analyses and the extent of 
these differences are shown in Table 1. The 
hypothesis of higher ratings received from 
members of one’s own race than from mem- 
bers of another race was supported, indi- 
cating an operant racial bias with respect to 
combat ratings within both 
groupings. 


racial rating 

In an earlier study of leadership ratings 
reported by Cox and Krumboltz (1958) in- 
volving Air Force personnel, it was demon- 
strated that the evidence of significant mean 
rating differences did not preclude larger 
areas of interrace rater agreement and over- 
lap in the ratings which members of each 
race gave to members of another race. The 
within ratee sample covariation of Caucasian 
and Negro ratings in the present study was 
examined in terms of product-moment cor- 
relation coefficients computed separately for 
each ratee sample. The two average ratings 
obtained for each ratee, i.e., the 
based on ratings made by Caucasian raters 
and the average based on ratings made by 
Negro raters, correlated .52 for the RA 


average 





JOHN i 


Caucasian sample, .52 for the US Caucasian 
sample, .42 for the RA Negro sample, and 
47 for the US Negro sample. Apparently a 
Caucasian-Negro rater 
agreement with respect to relative ordering of 
assigned combat ratings within squads ob- 
tained for the four ratee samples. With re- 
spect to interracial overlap of ratings, Cau- 
rated 27% of the Negroes higher 
than the average rating they gave men of 
their own race, and the Negroes in the study 
rated 29% of the Caucasians higher than the 
average rating they gave to men of their own 
race 


moderate degree of 


casians 


This percentage of overlap indicates 
that existing rating leniencies favoring same- 
race ratees are certainly far from complete 

It was further hypothesized that raters 
would give higher ratings to men of their 
own race than to men of another race: i.e., a 
Caucasian rater would give higher ratings to 
ratees than he would to his 
Negro ratees, and a Negro rater would give 
higher ratings to his Negro ratees than he 
would to his Caucasian ratees. To control for 
initial race differences in ratings a multiple 
covariance design was planned. However, a 
preanalysis comparison of the first-order and 
multiple correlation coefficients of 
variable” with the average 
combat rating computed on a racially hetero- 
recruit sample (NV 1.476) sug- 
gested no practical gain in precision of the 


his Caucasian 


proposed 
“control scores 


geneous 


multiple over the single variable regression 


model. The experimental combat infantry 


aptitude area score IN, was finally selected 
as that control variable with the highest ob- 


correlation (.40) with the 
rating. 

The results of the analyses of covariance 
are presented in Table 2. With the excep- 
tion of the comparison of ratings made by 
raters of RA ratees (F = 2.59, 
approximately), the Negro 


tained average 


combat 


Caucasian 
p = .15 


~ 
received significantly lower mean IN scores, 


ratees 


lower mean Caucasian ratings, and higher 


examined as 
weighted two-test 
experimental tests 
ifter induction. The 


In effect all of the several 
control 


scores 


potential variables were 
lized and 


th 


compo ites of standa 


administered to recruits 


elected control variable, ye composite ota 
elf-description que tionnai ind the 


Arithmetic 


Reasonin test of the ification Battery 


DEJUNG AND HARRY KAPLAN 


mean Negro ratings than did the Caucasian 
ratees. 

The assumptions of homogeneity of vari 
ance and of were examined for 
the four analysis samples. Both of these 
assumptions were clearly supported for all 
samples with the single exception of the 
rejection (F = 3.96, p = .06, approximately ) 
of the hypothesis of 


regression 


equal regression of 


Negro ratings on IN score for the RA 
Caucasian and RA Negro samples. This ex- 
ception is further evident in the deviant 
negative correlation of —.048 between the 


~ 
Negro rating and the IN score found for 
the RA Negro group (see Table 2), further 


indicating (a) that IN is inoperative as a 
control variable for this group and (6) that 
the sum of squares adjustment using the 
within “average” regression is inappropriate 
here. However, the large F ratios both for the 
initial analysis of variance (F = 11.92) and 
for the covariance analysis (F = 17.15) sug- 
gest that the failure to meet the assumption 
of homogeneity of regression is not a serious 
restriction here. 

A comparison of the Caucasian and 
Negro ratee samples in terms of adjusted 
mean ratings made of them by Caucasian 
raters yielded only nonsignificant differences 
(p > .50) for both the RA and US groups 
However, differences significant at the .01 
level were found between adjusted mean 
ratings made by Negroes of Caucasian and 
Negro both the RA 
groups. In terms of adjusted group means, 
Caucasian raters give similar ratings to Cau- 
casian and Negro ratees. Negro raters, on 
the other hand, found to give higher 
ratings to Negro ratees than they 
Caucasian ratees 

Referring to the four RA and to the four 
US samples in Table 1 the average of ratings 
made by Negroes of Negroes appears singu- 
larly high. The covariance adjustments serve 
toward equating (within the RA or US 
samples) the mean ratings for the Caucasian- 
Caucasian, Caucasian-Negro and Negro- 
rater-ratee combinations and, in 
effect, further accentuate this pro same race 
rating bias of the Negro rater. Quite ap- 
parently, the obtained significant differences 


ratees for and US 


were 


give to 


Caucasian 





EFFECTS OF RACt 


TABLE 


RIANCE OF AVERAGE COMBAT 


SS and products 


=X =X 


RA Caucasik 


20607 


Within Caucasian ratees 
Within Negro ratees 


Pooled within races 


130 

17 
14826 
667 


15493 


Between races 


Potal 


Within Caucasian ratees 
Within Negro ratees 

Pooled 
Between races 


Total 


20949 
1405 
22354 
2024 


24378 


within races 


i 


Wit! ’ ) 13071 
Within Negro ratec 3 1755 
Pooled w ‘ 14826 
667 


15493 


in Caucasian ratee 


Between 
Tota 6139 
US Negre 


ddd 


Within Caucasiar 
Within Negr« 


Pooled within races 


ratees 20949 
1405 


22354 


ratees 


Between races 


Total 


2024 


42 


the higher 
from Negroes. 
interpreting this phenomenon, the nu- 
meric composition of the training squads by 
race perhaps needs to be considered. In the 
previously mentioned Air Force study by Cox 
and Krumboltz (1958), the authors suggest 
that tend to re- 
sult in raters giving higher ratings to men of 
their own race. In the present study, where 
each squad is composed mainly of Caucasians 
with relatively few Negroes, it is reasonable 
to expect that the Negro rater, in rating 
other Negroes, is generally rating his closest 
buddies. The Caucasian, on the other hand, 
in rating 
is rating 


reflection of 
Negroes 


here are basically a 
ratings received by 


In 


‘“race-bounded friendships” 


members of his own racial group, 
nearly all of his squad members 
His possible preference and consequent higher 
ratings given to a few 


“closer” buddies is lost 


ON 


PEER RATINGS 


RATINGS FOR Four SAMPLE 


ZY 


in rater 


29289 


774 


03 


53209 


O33 
7350 

70656 
219)? 


7IRAL 


in his lower (average) rating the remain- 
ing squad members. 
ally, the effect is a rating bias on the part 
of the minority group rater. The theoretical. 
question as to whether this likely 
selection on the basis of race and the higher 
combat ratings given one’s own |! mem- 
bers is independent of numeri 


ol 


Caucasian Operation 


buddy 


ace 
squad com 
positions cannot be answered by this report.’ 
It should further that data 
suggest a leniency in rating on the part of 


be noted these 


tor cor 


. The 


trolling 


with i 
racial 


bias 
different 


confounding of 1 
bringing 


ice i( 


or about propor 


tions within groups relegates questions of relation 


ship between racial preference and racial composition 
tion to those more restrictive experimental situations 
which the the 
within a group are manipulable by the experimentet 
The in this present 


groupings, common to the 


in numbers of various race member 


deals with 


setting 


data report 


Army 


1? 





374 Joun E. 
the minority group in favor of their group 
members rather than indicate a rejection or 
prejudicial lower rating assigned to other race 
members. 


REFERENCES 


Birnspaum, A. H., Rosenperc, N., & Wuire, R. K. 
Validation of potential combat predictors: ZI re- 
sults from armor. USA TAGO Personnel Res. Br. 
iech. res. Note, 1957, No. 78. (a) 

Brensaum, A. H., Rosenperc, N., & Wuire, R. K. 
Validation of potential combat predictors: ZI re- 
sults for combat engineer. USA TAGO Personnel 
Res. Br. tech. res. Note, 1957, No. 79. (b) 


Cox, J. A., & Krumsottz, J. D. Racial bias in peer 


DEJUNG AND HARRY KAPLAN 


ratings of basic 1958, 21 
292-299. 

WILLeMIN, L. P., & Karcuer, E. K., Jr. Validation 
of early ratings against later ratings in the army’s 
combat arms selection studies. Paper read at 
American Psychological Association, Washington, 
D. C., September 3, 1958. 

Wittemin, L. P., & RosensBerc, N. Validation of 
potential combat predictors: ZI results for artil- 
lery. USA TAGO Personnel Res. Br. tech. res 
Note, 1957, No. 77. 

Wittemin, L. P., Rosenserc, N., & Waite, R. K 
Validation of potential combat predictors: ZI re- 
sults for infantry. USA TAGO Personnel Res. Br. 
tech. res. Note, 1957, No. 76. 


airmen. Sociometry, 


(Received January 2, 1962) 











Latest news in Psychology... 
from McGraw-Hill 


@ READINGS IN INDUSTRIAL AND BUSINESS PSYCHOLOGY, 
Second Edition 


By HARRY W. KARN and B. von HALLER GILMER, both at Carnegie Institute of Tech- 
nology. McGraw-Hill Series in Psychology. 515 pages, $6.95 (cloth), $4.95 (paper). 


This collection of original papers represents current thought and research in the area of industrial and 
business psychology. Over 800 articles were examined before a final decision was made. The new 
edition contains more articles of an experimental and theoretical nature and for the first time includes 
selected readings in the areas of organizational behavior, engineering psychology, communications and 
the psychology of perception. 


@ HUMAN FACTORS IN TECHNOLOGY 


By E. BENNETT, J. DEGAN, and J. SPEIGEL, all of the Mitre Corporation, Bedford, 
Massachusetts. Available in October, 


Favorably received by the Human Factors Society, text was prepared with the assistance of outstanding 
human factors scientists and engineers representing a broad spectrum of disciplines and areas of investi- 
gation. Offers exciting view of some of the newest and most interesting aspects of contemporary hu- 
man factors in science and engineering. Will be useful to technically trained people in a wide variety 
of areas and specializations. Text is especially suitable for supplementary reading in graduate courses 
concerned with human factors in technology, or in upper level undergraduate courses in engineering or 
applied psychology and human engineering. 


@ STATISTICAL PRINCIPLES IN EXPERIMENTAL DESIGN 


By B. J. WINER, Purdue University. McGraw-Hill Series in Psychology. 672 pages, 
$12.50. 


This graduate-level text provides statisticians and experimental psychologists with basic principles used 
in the construction of experimental designs. Examined are designs found in current experimental litera- 
ture and those with unique and potentially useful features. Their advantages and disadvantages are 
studied in detail. Examples are drawn from areas of experimental, industrial and clinical psychology. 
@ PSYCHOLOGY: A Study of a Science 

Study Il. Empirical Substructure and Relations With Other Sciences 


Volume V. The Process Areas, The Person, and Some Applied Fields: 
Their Place in Psychology and in Science 


Volume VI. Investigation of Man as Socius: Their Place in Psychol- 
ogy and the Social Sciences 


Edited by SIGMUND KOCH, Duke University. Volumes V and VI available in January, 
1963. 


The fifth and sixth volumes in this vast, seven volume inquiry into the status and tendency of psychologi- 


cal science. Study || seeks an increased understanding of the internal structure of psychological sci- 
ence, and its place in the matrix of scientific activity. 


send for your approval copies now 


McGRAW-HILL BOOK COMPANY, INC. 


330 West 42nd Street New York 36, N.Y. 











Two Fascinating New Books for Psychologists 
* PERCEPTION AND MOTION 


By Karl U. Smith, Ph.D., University of Wisconsin and William M. Smith, Ph.D., Dartmouth 


College. 


About 432 pages, about 193 illustrations—Just Ready! 


New answers to question of how patterned motion is regulated 





Here is a most unusual survey of the problems 
of behavior organization in man—considered 
in terms of sensory feedback mechanisms, 
both visual and auditory. The authors have 
developed a new theory toward explaining 
certain aspects of behavior, perception and 
motor learning. Their neurogeometric theory 
revolves around the concept that the brain is 
a differential detector responding to differ- 
ences in stimulation and sensory feedback— 
that the differential action within the nervous 
system is based on varying input to the 
neuron from several sources of stimulation 
rather than being based on the decision- 
making powers of the synapse. Other ele- 
ments of the theory hinge on perception and 
motion being considered as a single mecha- 
nism rather than as two separate forms of 
adaptation; and on motion itself being 
divided into postural, transport and manipu- 
lative components—each regulated by par- 
ticular stimulation patterns. 


Research techniques used to develop and 
support the authors’ contentions largely in- 
volve electronic methods of motion analysis 
and use of closed circuit television and 


videotape recordings. Findings and conclu- 
sions cast doubt upon many widely accepted 
concepts about the reflex, the synapse, and 
the nature of the learning process (particu- 
larly on the validity of the reinforcement 
theory). To supply background information 
for viewing their work in proper perspective, 
the authors briefly review historical develop- 
ment of the science of motion. 


Contents embrace the following discussions: 
Concepts of Perceptual-Motor Organization 
—The Science of Motion—Motion, Work and 
Machines—Early Experiments of the Spatial 
Problems of Behavior—Recent Studies of 
Displaced Vision—Critical Summary of Dis- 
placed Vision Studies—Experimental Foun- 
dations of Neurogeometric Theory—Use of 
Television in Behavior Research—Televised 
Inversion and Reversal of Visual Feedback-— 
Televised Angular Displacement of Visual 
Feedback—Size-Distorted Visual Feedback— 
Displaced Vision and Theory of Tool Using— 
Delayed Sensory Feedback—Infant Control 
of the Behavioral Environment—Effects of 
Displaced and Delayed Feedback—Neuro- 
geometry of Motion. 


* DELAYED SENSORY FEEDBACK AND BEHAVIOR 


By Karl U. Smith, Ph.D., University of Wisconsin. About 128 pages, illustrated—Just Ready! 
A new type of temporal:analysis of behavior 





This book is an elaboration of a portion of the 
book described above. It investigates the 
strange effects produced by an artificially 
induced time lag between performance of 
motion and perception of that motion. This 
investigation bears directly upon present day 
theories of learning and behavior; and gives 
new insight into human operation of com- 
plex machine systems. In this study, through 
use of a closed television circuit, the per- 
former’s actions are fed back to him pic- 
torially—but have been displaced spatially by 
manipulating the electronic circuits, the 
camera lenses, or the camera itself; or the 


image has been delayed by means of video- 
tape recordings. 


Contents are divided into six broad sections: 
Significance of Delayed Sensory in Behavior 
Research—Delayed Visual Feedback in Track- 
ing Performance—-Delayed Auditory Feed- 
back—Blind Performance with Delayed 
Pictorial Feedback—Concurrent Delayed Vis- 
ual Feedback Implications of Experiment. 


Under these, are discussed such unusual 
topics as delayed graphic feedback of hand- 
writing, artificial stutter, delayed guidance 
feedback from remote cosmic systems: 
theoretical considerations.] 


Teachers and students of psychology and education; industrial psychologists; certain researchers 
in television techniques; time and motion engineers will all find valuable and throught-provok- 


ing information in these two new books. 





JLAP 


W. B. SAUNDERS COMPANY 


West Washington Square, Phila. 5, Pa. 


Please send me the following books and bill me: 
) Smith and Smith—Perception and Motion, about $9.00. 
[] Smith—Delayed Sensory Feedback and Behavior, about $6.00. 























hehe juarc 
he j 


sree ye 
ooh a 


























